dual hybrid

7/29/2019 Dual Hybrid

1/9

30 1541-1672/07/$25.00 2007 IEEE IEEE INTELLIGENT SYSTEMSPublished by the IEEE Computer Society

R e c o m m e n d e r S y s t e m s

Collaborative FilteringUsing Dual InformationSourcesJinhyung Cho, Kwiseok Kwon, and Yongtae Park, Seoul National University

With the proliferation of e-commerce, recommender systems have become an

important component of personalized e-commerce services and are essential

for e-commerce providers to remain competitive. One of the most successful recom-

mendation techniques is collaborative filtering, whose performance has been proved

in various e-commerce applications.1,2 CF auto-

mates the word of mouth process.3 It forms a pre-

dictor group that serves as an information source

for recommendations.

However, conventional CF methods suffer from a

few fundamental limitations such as the cold-startproblem,data sparsity problem, and recommender reli-

ability problem.4,5Thus, they have trouble dealing with

high-involvement, knowledge-intensive domains such

as e-learning video on demand. To overcome these

problems, researchers have proposed recommendation

techniques such as a hybrid approach combining CF

with content-based filtering.4 Because e-commerce

Web sites for e-learning often have various product

categories, extracting the many attributes of these cat-

egories for content-based filtering is extremely bur-

densome. So, it might be practical to overcome these

limitations by improving the CF method itself.

Conventional CF methods base their recommen-

dations on a single recommender group. Our CF

method forms dual recommender groupsa similar-

usersgroup and an expert-usersgroupas credible

information sources. Then, it analyzes each groups

influence on the target customers for the target prod-

uct categories.

Using this method, weve developed DISCORS

(DualInformation Source Model-Based Collabora-

tiveRecommender System) and applied it to a high-

involvement product: e-learning VoD content. In

experiments,DISCORS outperformed conventional CF

methods in situations involving variations in the

product domain and in data sparsity.

CF from the consumerpsychology viewpoint

When deciding what to purchase,consumers depend

on a variety of information sources and have different

acceptance levels for each source. Influencing factors

can be the product domain characteristics, the con-sumers degree of involvement with the product, and

the users level of knowledge about the product.6

In the real world, a person making decisions about

movies or daily necessities will seek the opinions of

neighbors with similar preferences. On the other

hand, when choosing expensive products or services

for long-term use, such as a notebook computer or

an educational program, an individuals decision is

strongly influenced by people with professional

expertise in that field. Customer preferences for rec-

ommendation sources might also differ within a

product domain. For example, when choosing a

movie, some customers prefer neighbors opinions

while others prefer expertsopinions.

In this article, source diversity refers to the variety

of information sources, and source receptivity refers

to the level of a customers acceptance of a source.

Source receptivity can differ across customers or the

involvement level of product domainswe call this

heterogeneous source receptivity. Product involve-

mentrefers to the level of personal relevancethat is,

the level of importance of a product or ones interest

in it.7 High-involvement products are those for which

the buyer is prepared to spend considerable time and

effort in searching. Low-involvement products are

bought frequently with a minimum of thought and

Conventional

collaborative-filtering

methods use onlyone information

source to provide

recommendations.

Using two sources

similar users and

expert users

enables more effective,

more adaptive

recommendations.


2/9

effort because they arent of vital concern and

have no great impact on the consumers

lifestyle. Unfortunately, existing CF methods

dont consider source diversity, heterogeneous

source receptivity, or product involvement.

Similarity-based CFand its limitations

In traditional CF methods, the single rec-

ommender group comprises the nearest

neighbors with preferences similar to those

of a target user. So, these methods are called

similarity-based CF.

As we mentioned before,SCF methods suf-

fer from the recommender reliability problem.

That is, a recommender might not be reliable

for a given item or set of items, even though the

recommenders and target users preferences

are similar.5 For example, when looking formovie recommendations, well often turn to

our friends, on the basis that we have similar

movie preferences overall. However, a partic-

ular friend might not be reliable when it comes

to recommending a particular type of movie.

Trust-based CF and its limitationsTo solve the recommender reliability prob-

lem, researchers have proposed trust-based

CF.5,8,9 Such methods derive the neighbors

trust explicitly or implicitly and use it as a

supplementary criterion of similarity to select

more credible neighbors.Each trust-based CF method employs a dif-

ferent meaning of trust. For example, trust can

imply trustworthinesshow much a user can

trust other users in a trust network. Such trust-

aware CF9 uses an explicitly rated trust value

to select trustworthy users as a recommender

group. In this way, it solves the recommender

reliability problem. However, it doesnt

account for source diversity or heterogeneous

source receptivity.

Second, trust can imply expertise or competencya users ability to

make an accurate recommendation.5,8 In this case, CF can account for

source diversity and recommender reliability by forming a recommender

group based on both similarity and expertise. To do this, it uses the prod-

uct or mean of the values for similarity and expertise. However, because

this method equally weights similarity and expertise when combining

them, without considering a variety of user or product domain charac-

teristics, it doesnt account for heterogeneous source receptivity.

Group influence and thedual-information-source model

Most people belong to a number of different groups and perhaps

would like to belong to several others. However,not all groups exert

the same amount of influence on an individual. Sociologists use ref-

erence group to refer to those groups that can modify or reinforce an

individuals attitudes. Group behavior theory in consumer psychol-

ogy has adopted the reference-group concept to consumer behavior.

It holds that two reference groupssimilar users and experts

strongly influence a consumers buying decision and that consumers

perceive these groups as credible information sources.7

Similarity- and trust-based methods view CF from a personal-influence

perspective (see figure 1a). On the basis of group behavior theory, our

method views CF from a group-influence perspective (see figure 1b).

It builds information sources for the recommendations to a target user

in accordance with two criteriasimilarity and expertise. As we men-

tioned before, it employs dual recommender groupsa similar-user

group and an expert-user groupas the information sources.

This model overcomes the recommender reliability problem by

using not just the similarity criterion but also the expertise criterion,

and it accounts for source diversity by utilizing multiple recom-

mender groups. In addition, it accounts for heterogeneous source

receptivity by determining each groups level of influence on an indi-

vidual user (source receptivity is the same concept as group influ-

MAY/JUNE 2007 www.computer.org/intelligent 31

Target user

Target user

Personal influence

Similar users

0.8

1.2

Our proposed CF (DISCORS)

Group influence(source receptivity)

with personal influence

Similar users

0.8

0.7

Similar & trustworthy users

Similarity-based CF

Trust-based CF

0.1

0.1

0.2

0.1

0.9

All users

Personal influence

Group influence

0.90.1

0.80.6 0.7

0.5

0.10.2

0.10.9

0.10.3

0.20.2

0.80.7

0.70.3

0.90.1

0.3

0.8

0.10.9

0.10.2

0.20.2

0.10.3

Personal influence

Similarity Trustworthiness

Similarity Expertise Source receptivity

All users

All users

Target user

Expert users

(a)

(b)

Figure 1. Two views of collaborative filtering: (a) Similarity-based CF and trust-basedCF take a personal-influence perspective. (b) DISCORS takes a group-influence

perspective.


3/9

ence, but from the users viewpoint). Consequently, a more person-

alized recommendation is possible, taking into account variations in

product domains and user tendencies.

Redefining expertiseand trustworthiness

According to the source credibility model, proposed in consumer

psychology studies on word-of-mouth communication, an informationsources credibility comprises expertise,trustworthiness, similarity, and

attraction.7 Here, expertise is the extent to which a source is perceived

as being capable of providing correct information,while trustworthiness

implies the degree to which a source is perceived as providing infor-

mation reflecting that sources actual feelings or opinions.

On the basis of this understanding of expertise, we define the

expert-users group as users who have been carrying out a number

of activities in a category that includes the target item so that they

have a high probability of giving accurate recommendations to other

users. We measure expertise by incorporating an appropriate mea-

surement and various factors for the weights.

DISCORSOwing to recent advances in multimedia and network technolo-

gies, e-learning has become a promising alternative to traditional

classroom learning. Web-based e-learning content services offer thou-

sands of online courses. Currently,most e-learning content providers

still offer all learners the same content, failing to satisfy individual

learners. So, to provide more personalized content delivery, thereby

increasing their competitiveness, they need to offer more relevant

recommendation methods.

Unfortunately, most recommendation methods focus on relatively

low-involvement, entertainment product domains such as movies, cell

phone wallpaper images, and music. So, we developed DISCORS as a

viable alternative. Also, for high-involvement, knowledge-intensive

product domains such as e-learning VoD, collecting sufficient explicit

rating data from customers is difficult. To over-

come this difficulty, DISCORS employs Web-

usage mining to create users rating profiles

from their implicit Web-usage behavior.

The DISCORS recommendation process

combines offline mining and online recom-mendation (see figure 2).

Offline miningThis subprocess has three phases: create

each users rating profile, form the dual infor-

mation sources, and extract each users

source receptivity.

Creating user rating profiles. These profiles

describe a users preference regarding each

item by mining the Web-usage data collected

in the e-learning VoD Web site. DISCORS con-

structs each profile according to the three basicsteps of online VoD service use: click-through,

preview, and payment. The relative frequency

with which a user performs these steps for an

item serves as an implicit preference rating;

we assume that if the usage frequency for an

item is relatively high, the user has a high preference for that item.

We define the user rating profile,Ru,i, by modifying a previous

approach10 to make it suitable for e-learning VoD content service:

(1)

whereRu,i is the rating profile of user u for item i, and m is the number of

items. , ,and are the number of click-throughs, previews, and

payments by a user for each item. The value ofRu,i is a sum of the nor-

malized value of , , and . It ranges from 0 to 3, with a larger

value indicating a stronger preference. It increases with the frequency

of each stepclick-through,preview, and payment. Although each steps

weights appear equal in equation 1, they arent, because customers who

purchased specific content not only clicked the related Web pages but

also previewed the content. So,Ru,i, which is used in the subsequent

phases, is the normalized and weighted sum of , , and .

Forming the dual information sources. In this phases first step,

DISCORS selects users similar to a target user a and calculates their

relative preference for a target item i.

We define similar usersas a group of users with preference ratings

similar to those the target user has had. To measure similarity, we

employ Pearsons correlation coefficient, which is the most widely

Ru ip

,Ru iv

,Ru ic

,

Ru i

p

,

Ru i,

Ru i

c

,

Ru ip

,Ru i,Ru i,

R

R R

Ru i

u ic

i mu ic

i mu ic,

, ,

,

min

max=

( )

( )

1

1 1

( )

+ ( )

i mu ic

u iv

i mu iv

i m

R

R R

min

min

m

,

, ,1

1aax min

min

, ,

,

R R

R R

u iv

i mu iv

u ip

i mu

( ) ( )

+

1

1,,

, ,max min

ip

i mu ip

i mu ip

R R

( )( ) ( )

1 1


32 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

Implicit

Web usefeedback

ExpertExpert users

usersprediction

predictionterm

calculation

Similarusersprediction

termcalculation

term

Offline mining process Online recommendation process

Phase 1 Phase 2 Phase 3

Similarity-basedcollaborative

filtering

Recom-mendationgeneration

Similarusers

predictionterm

Expertise-basedcollaborative

filtering

Dual-information-

sourceformation

Sourcereceptivityextraction

Userratingprofile

creation

Userratingprofiledata-base

Userratingprofile

DB

Webusagedata-base

Target user

Receptivitydatabase

Figure 2. The DISCORS recommendation process combines offline mining and onlinerecommendation.


4/9

used in conventional CF methods. We define the similarity between a

target user a and another user u, s(u, a), as

(2)

whereRu,i is the rating of user u for item i, the item co-rated by two users.

On the basis of previous research,1 we select the users that have a

similarity threshold higher than 0.3 as the target users neighbors.

We define theprediction term of similar users (neighbors), S(a, i),

as the similarity-weighted sum of similar users relative preference

compared to their average preference. We calculate this term as

(3)

whereRu,i is the rating of similar user u for item i, and n is the num-

ber of users similar to target user a.

In the second step, DISCORS selects the category experts and cal-

culates their prediction term. As part of this step, we devised a

measure of expertise reflecting the users activity and prediction

competency. Expertise can be measured at the total-item level (a

movie expert), category level (an action movie expert), or indi-

vidual-item level (a Titanic expert).5 For recommendation,

expertise measured in a more specific domain can be more pre-

dictable. However, individual-item-level expertise isnt meaning-

ful for real-world recommendations (Hes an expert on the movie

Titanic. So what?).

So, in this study, we measure expertise at the category level. Wedefine the expertise, e, of user u for category c as

(4)

where U(j) is the users who exhibited Web-usage behaviors for item

j (except for target user u), C(i) is the item set that has the Web usages

for the category of target item i,Nc(i) is the cardinality ofC(i) , and

(u, c) is the activity weighting.

We define (u, c) as 1 1/n (where n is the number of ratings in

the category) to obtain a higher value of expertise for a user as that

user rates more items in that category. We select the users who have

expertise within the top 3 percent per category as expert users,because

they showed the best recommendation results in terms of precision

and coverage compared to other top-ranking user groups. We define

the expert usersprediction term,E(i), as the expertise-weighted sum

of expert users relative preference compared to their average prefer-

ence. We calculate this term as

(5)

whereRu,i is the rating of expert user u for item i, and n is the num-

ber of expert users in category c.

Extracting source receptivity. The final phase extracts the hetero-

geneous source receptivity for information sources with reference to

the users and product domains. In other words, we build the follow-ing source receptivity model, under the assumption that each user

demonstrates different susceptibility to the group influence:

(6)

Here, a and i denote a target customer and item number. ks andke are

the importance weights ofS(a, i) andE(i)that is, a users receptiv-

ity to the similar-user groups recommendation and the expert-user

groups recommendation, respectively. We estimate these by multiple

regression analysis using the least-squares method. If an overlap exists

between similar and expert users, theres a high probability of multi-

collinearity between S(a, i) andE(i). In that case, measuring each

groups influence is difficult. So, when multicollinearity exists betweentwo variables (we consider it to exist if the variance inflation factor is

greater than 10), we use ridge regression analysis to estimateks and ke.

Online recommendationDISCORS generates personalized recommendations in real time by

combining each users source receptivity values with each informa-

tion sources prediction terms. Here we explain recommendation pro-

cedures for an existing user, a new user, and a new item. Figure 3

shows the pseudocode for our recommendation algorithm.

Recommendation for an existing user. DISCORS gets the prediction

terms after it forms each recommender group for the target user and

item. Next, it finds the target users source receptivity, extracted dur-ing offline mining. Then, it determines the target users recommen-

dation score by multiplying each sources prediction term and the

users receptivity values:

(7)

If only one information source exists (for example, there are no

similar users who rated the target item), DISCORS employs the single-

source receptivity calculated with the existing information source

only (see figure 3).

Recommendation for a new user or a new item. Because finding sim-

ilar users for a new user is impossible, DISCORS provides recommen-

dations based on the expert users for the items category. For new

that is, early-stageitems, we might not find similar users who rated

the item, either. So, recommendations are once again based on those

expert users. As time passes and Web use increases, DISCORS will apply

the same recommendation procedure for existing users to these cases.

Pilot system implementationOur pilot DISCORS system consists of six software agents and five data-

bases (see figure 4a). Figure 4b shows the Web interface. The pilot sys-

tem operates each agent independently so that the whole system remains

stable during experimental substitutions or adjustment of an agent.

Theuser profile creation agentcreates and manages user rating pro-

files through offline Web-usage-mining tasks such as periodic collect-

P a i R k S a i k E i Ca s eprediction , ,( ) = + ( ) + ( ) +

R a i R k S a i k E i Ca s epast_rating , ,( ) = ( ) + ( ) +

E iR R u c

u c

u i u eu

n

eu

n( ) =

( ) ( )( )

=

=

, ,

,

1

1

e

u j a ja U jj C i

c i

u c u c

R R

N, ,

, ,

( ) = ( )

( ) ( )

(

1

))

S a iR R u a

u a

u i u su

n

su

n,

,

,

,

( ) =( ) ( )

( )=

=

1

1

su i u a i ai

m

u i u a

u aR R R R

R R R

,, ,

,

( ) =( ) ( )

( )

= 12

,,i ai

m

i

mR( )==

2

11



5/9

ing, parsing, and analyzing of real transaction data from the Web-usage

database, customer database, and product database. It integrates the

Web-usage data in a form suitable for the recommendation method.

The SCF agentandECF agentactivate and manage the parts of our

CF algorithm that calculate the similar-user groups and expert-user

groups prediction terms, respectively. DISCORS uses these prediction

terms to extract source receptivity and generate recommendations.

The receptivity extraction agentextracts each users receptivity for

the dual information sources. For this task, the agent analyzes users

past rating profiles and the dual information sourcesprediction terms.

The recommendation generation agent

makes a personalized recommendation list

for each target user according to the algo-

rithm in figure 3. For each target user, it

determines recommended products that

reflect his or her source receptivity.Finally, the Web interface management

agent provides user interfaces enabling

Web-usage behaviors such as selecting a

category or content, previewing e-learning

content, and making electronic payments.

Figure 4b shows the interface for present-

ing recommendation lists.

Evaluating DISCORS performanceWe wanted to answer these questions:

How does DISCORS perform compared to

CF methods based on a single-informa-tion-source model?

How does the degree of product involve-

ment affect the performance of DISCORS

compared to that of CF methods based on

a single-information-source model?

How does data sparsity affect the perfor-

mance of DISCORS compared to that of SCF

methods?

We compared DISCORS to three CF rec-

ommender systems based on a single-infor-

mation-source model. We used these bench-

mark systems:

SCF (similarity-based CF)a single-

information-source model with one

criterion,

ECF (expertise-based CF)a single-

information-source model with one

criterion,

HCF (hybrid CF with similarity weight-

ing and expertise weighting)a single-

information-source model with two criteria

and homogeneous-source receptivity, and

DISCORSa dual-information-source model

with two criteria and heterogeneous-source

receptivity.

Evaluation metricsTo evaluate Discors performance,we employed two broad classes

of recommendation accuracy metrics.

The first ispredictive-accuracy metrics. Here, we use the mean

absolute errorto compare each systems predictive accuracy. MAE

is the absolute difference between a real and a predicted rating value.

We use coverage, the number of items for which predictions can be

formed as a percentage of the total number of items, to compare the

range of recommendations for each system.

The second class is classification accuracy metrics. To evaluate

how well the recommendation lists match the users preferences, we



Figure 3. Pseudocode for the recommendation algorithm.

Algorithm : DISCORSInput :R: user rating matrix;k: source receptivity;

OutputP(a, i) : Recommendation Score;

Main()for all items i

for all user uif (u= new user) then call NewUserRecomm();

elseif (i= new item) then call NewItemRecomm();else call ExUserRecomm();

endifendfor

endfor

end Main()

ExUserRecomm()begincalculate prediction term of dual recommender groups S(a, i), E(i);if (both S(a, i) and E(i) exist) then select ks, ke, C;calculate P(a, i) = Avg(Ra) + ksS(a, i) + keE(i) + C;elseif (E(i) doesnt exist) then select kso, Cs; //kso is the single-source(similar-user group) receptivitycalculate P(a, i) = Avg(Ra) + ksoS(a, i) + Cs;elseif (S(a, i) doesnt exist) then select keo, Ce; //keois the single-source(expert-user group) receptivitycalculate P(a, i) = Avg(Ra) + keoE(i) + Ce;endifend ExUserRecomm( )

NewUserRecomm( )begincalculate prediction term of expert-user group E(i);calculate P(a, i) = Avg(Ru) + E(i); //Avg(Ru) is the average of entire users ratingsend NewUserRecomm()

NewItemRecomm()begincalculate prediction term of expert-user group E(i);select keo, Ce;calculate P(a, i) = Avg(Ra) + keoE(i) + Ce;end NewItemRecomm()


6/9


employ the widely usedprecision, recall,

and F1 measures. If a user rates an item as

being greater than 70 percent of the maxi-

mum preference value (which is 3) or has

purchased the item, we consider that the user

prefers that item. DISCORS recommends anitem when that items predicted rating is

greater than 70 percent of the maximum

preference value. Precision refers to the

number of recommended items that a user

actually prefers, whereas recall refers to the

number of preferred items that the system

actually recommends. F1 is the harmonic

mean of precision and recallthat is, (2 *

precision * recall)/(precision + recall).

A preliminary experimentwith research data

Before we implemented DISCORS, we eval-uated its feasibility with a research data set

that was open to the public. In the prelimi-

nary experiment, we used the MovieLens

data set consisting of approximately 1 mil-

lion ratings involving 6,040 users and 3,900

movies (www.grouplens.org/node/12). To

evaluate each recommender system, we sep-

arated this data set into two parts:

a modeling set containing the 6,040 users

ratings of 3,000 movies and

a validation set containing those usersrat-

ings of the remaining 900 movies.

Table 1 shows the results. The MAE of

DISCORS is approximately 4.5 percent lower

than that of SCF, 8.7 percent lower than that

of ECF, and 5.8 percent lower than that of

HCF, at a significance level of 1 percent.

Although the performance gain of DISCORS

over SCF isnt high, it does indicate our sys-

tems superiority, and we expect the gain to

increase as product involvement or data spar-

sity increases. The coverage of DISCORS

exceeds that of SCF by 9.30 percent and that

of HCF by 2.90 percent.

Experiments with realWeb-usage data

Paran.com (www.paran.com) is a Web

portal operated by Korea Telecom Hitel, a

subsidiary of Korea Telecom. This site,

which has approximately 16 million sub-

scribers and 8 million unique visitors per

week, is a major Korean digital-content provider. Paran.com pro-

vided us with the Web-usage data and purchasing data pertaining

to e-learning content for foreign languages and to digital-comics

content (digitalized comic books), logged from 1 January to 30

June 2006. The e-learning content comprises various English,

Japanese, and Chinese categories and provides 192 items. The dig-

ital-comics content comprises 456 items in eight categories (such

as action and drama).

Through data preparation and Web-usage mining, we obtained

14,731 ratings of 1,452 users for 126 items in the e-learning content

Receptivityextraction

agent

User ratingprofile

database

User profile creation agent

(b)

(a)

Web-usagedatabase

Productdatabase

Customerdatabase

SCF agent(similarity-based CF)

Similarity calculation

Similar-user-group formation

Similar usersprediction term calculation

Recommendation generation agent

Web interface management agent

ECF agent(expertise-based CF)

Expertise calculation

Expert-user-group formation

Expert usersprediction term calculationReceptivity

database

Figure 4. The implementation of the pilot DISCORS system: (a) the system architectureand (b) the Web interface for presenting recommendation lists.


7/9

and 373,514 ratings of 11,245 users for 318 items in the digital-

comics content, standardized from 0 to 3. We divided each data set

into a modeling set and a validation set. The modeling set contained

randomly selected items amounting to 80 percent of the total items;

the validation set contained the remaining items.

The e-learning content is more expensive (US$44 to $50 per lec-ture) than the digital-comics content ($0.1 to $0.3 per volume). Also,

typical usage of e-learning content lasts more than one monthmuch

longer than for digital comics. So, we classified the e-learning con-

tent as a relatively high-involvement product and the digital-comics

content as a relatively low-involvement product. We then compared

the systems performance on the basis of product involvement.

Classification accuracy with product involvement. The rating-pro-

file data we used in this experiment didnt come directly from users;

we inferred the data through Web-usage and purchasing results. So,

we compared the systemsperformance by measuring classification

accuracy. We expected that DISCORS would perform better than sin-

gle-information-source CF because our system considers sourcediversity and heterogeneous source receptivity. Also, because con-

sumers will more likely listen to experts opinions as their product

involvement increases,we assumed that DISCORS would perform bet-

ter for high-involvement products.

Table 2 shows the results. DISCORS outperformed SCF by 26.0 per-

cent for e-learning content and 10.34 percent for digital-comics con-

tent, with F1 values at a significance level of 1 percent. Furthermore,the performance gain of DISCORS over SCF was significantly higher

for e-learning than for digital comics. This supports our hypothesis

that DISCORS performs even better as product involvement increases.

We initially expected that DISCORS would perform worse than ECF

for e-learning content because consumer reliance on experts tends

to increase as product involvement increases. Contrary to our expec-

tation, DISCORS outperformed ECF by 16.33 percent for e-learning

content and 13.08 percent for digital-comics content. However, the

difference isnt statistically significant.

DISCORS also outperformed HCF by 7.18 percent for e-learning

content and 9.83 percent for digital-comics content. This result sup-

ports our assumption that a dual-information-source model can out-

perform a single-information-source model. However, the differencein the performance gains across products isnt significant.

The effects of sparsity. CF methodsperfor-

mance depends on the availability of a criti-

cal mass of ratings. Conventional CF meth-

ods exhibit the data-sparsity problem; that is,

the recommendation quality decreases sud-

denly as data sparsity increases. We assumed

that a CF method using a dual-information-

source model will perform well even with

data sparsity. To prove this assumption, we

compared the performance of DISCORS to SCF

for various levels of data density.The original data density levels were 1

14,731/(1,452 * 126) = 0.9195 for e-learning



Table 1. The predictive accuracy of DISCORS

and three benchmark collaborative-filtering systems.

System Mean absolute error Coverage (%) t-value (p < 0.01)

DISCORS 0.6923 98.96

SCF (similarity- 0.7250 90.52 6.952based CF) (4.5%)* (9.30%)*

ECF (expertise- 0.7584 85.71 24.075based CF) (8.7%)* (15.50%)*

HCF (hybrid CF) 0.7348 96.13 21.534

(5.8%)* (2.90%)*

*The figures in parentheses indicate the performance gain of DISCORS over that benchmark system.

Table 2. The classification accuracy of DISCORS and three benchmark systems,for high (e-learning) and low (digital-comics) product involvement.

Precision Recall F1

System E-learning Comics E-learning Comics E-learning Comics

DISCORS 0.3258 0.3363 0.2088 0.3187 0.2545 0.3272

SCF 0.2257 0.2696 0.1828 0.3295 0.2020 0.2966(44.35%)* (24.74%)* (14.23%)* (3.30%)* (26.00%)* (10.34%)*

t-value between DISCORS and SCF 82.563 63.452 8.875 4.512 8.924 4.572

t-value between domains 8.276 3.548 4.853

ECF 0.2972 0.2677 0.1731 0.3149 0.2188 0.2894(9.62%)* (25.63%)* (20.62%)* (1.20%)* (16.33%)* (13.08%)*

t-value between DISCORS and ECF 2.872 3.548 15.602 4.669 12.245 9.642


HCF 0.2804 0.2887 0.2059 0.3078 0.2374 0.2979(16.19%)* (16.49%)* (1.41%)* (3.53%)* (7.18%)* (9.83%)*

t-value between DISCORS and HCF 12.533 13.234 0.072 8.187 10.669 6.715


*The figures in parentheses indicate the performance gain of DISCORS over that benchmark system.p < 0.01


8/9

content and 1 373,514/(11,245 * 318) = 0.8955 for digital comics.

We obtained seven different density levels as follows. After dividing

the data sets into modeling and validation portions, we retained 100

percent, 87.5 percent,75 percent, 62.5 percent, 50 percent, 37.5 per-

cent, and 25 percent of the nonzero entries in the modeling set, by ran-

domly removing nonzero entries.Density affects the F1 values for DISCORS

and SCF and the performance gain of DISCORS

for both e-learning and digital-comics con-

tent (see figure 5). For e-learning, as the den-

sity decreases, the performance gain increases

from 26.0 percent to 36.4 percent (F = 2.956,

p < 0.01). For digital comics, as the density

decreases, the performance gain increases

from 10.3 percent to 32.0 percent (F = 4.275,

p < 0.01). The F statistic provides a test for

the statistical significance of the difference in

the observed DISCORS performance gain over

sparsity levels through ANOVA analysis.These results have three interesting impli-

cations. First, the lower the data density (the

higher the data sparsity), the better the perfor-

mance gain of DISCORS relative to SCF. This

implies that DISCORS helps mitigate the data-

sparsity problem regardless of product domain.

Second, for relatively low product involve-

ment, the performance gain of DISCORS rela-

tive to SCF is more sensitive to data density.

This implies that for high-involvement prod-

ucts, sparsity doesnt affect SCF; however,this

isnt the case for low-involvement products.

Finally, ECF, a component of DISCORS, is moreeffective for high-involvement products. This

supports our assumption that consumers tend

to be more receptive to experts opinions as

product involvement increases.

Visualizing source receptivity. We used a

visualization to analyze the users source

receptivity with variations in the product

domain. In figure 6, thex-axis represents receptivity to similar

users recommendations (ks), and they-axis represents receptivity

to expert usersrecommendations (ke). For a low-involvement prod-

uct (digital comics), most users have a low dependency on experts,

as we expected. The centroid of the users segment for that product


(b)

0.0000

0.0500

0.1000

0.1500

0.2000

0.2500

0.3000

F1

0.0000

0.0500

0.1000

0.1500

0.2000

0.2500

0.3000

0.3500

100 88 75 63 50 38 25

100 88 75 63 50 38 25

Data density (% of original)

F1

0.0

5.0

10.0

15.0

20.0

25.0

30.0

35.0

40.0

DISCORS

performancega

in(%)

DISCORS (D) SCF (S) (D S)/S

DISCORS (D) SCF (S) (D S)/S

(a) Data density (% of original)

0.0

5.0

10.0

15.0

20.0

25.0

30.0

35.0

40.0

DISCORSperformancega

in(%)

Figure 5. How data sparsity affects DISCORS and SCF for (a) a high-involvement product(e-learning) and (b) a low-involvement product (digital comics).

(a) (b)

Receptiv

itytoexpertuse

rs

recommen

dation

(ke)

Receptivity to similar users recommendation (ks) Receptivity to similar users recommendation (ks)

3.02.52.01.51.00.50.5 0

3.0

2.5

2.0

1.5

1.0

0.5

0

0.5

1.0

1.5

2.0

Receptiv

itytoexpertuse

rs

recommen

dation

(ke)

3.0

2.5

2.0

1.5

1.0

0.5

0

0.5

1.0

1.5

2.02.52.01.51.00.50.5 0

Figure 6. Users source receptivity for (a) a high-involvement product (e-learning, 1,452 users) and (b) a low-involvement product

(digital comics, 11,245 users).


9/9



is (1.126, 0.239). However, for a high-involvement product (e-learn-

ing), many users exhibit expert-user dependencies; their centroid is

(0.8034, 0.5337).

We observed that you can classify users into expert-dependent

users and neighbor-dependent users with respect to product

domains. Accordingly, marketing staff in an e-commerce company

can identify the most effective information source on the basis of

the characteristics of individuals and product domains. Conse-

quently, expert-dependent users could receive Web or mobile con-

tent that reflects expert users recommendations, even for low-

involvement products. Similarly, neighbor-dependent users could

receive neighbors recommendations, even for high-involvement

products. This strategy will enable more effective and more adap-

tive personalized marketing.

Because our results are based on data of a particular e-commercesite and a specific research data set, we need to evaluate DIS-CORS with data sets from various e-commerce product domains. In

addition, we need to devise a more refined technique for analyzing

Web usage that can automatically extract both user preference and

user credibility. Also, it would be interesting to expand DISCORS to

other challenging e-commerce domains or environments that require

a recommendation method. Although we implemented DISCORS for

providing e-learning services in this study, we believe its generallyapplicable to a variety of e-commerce recommender systems.

AcknowledgmentsWe thank Korea Telecom Hitel Paran.com for providing us with the

Web usage data used in this research, and Jeeyoung Yoon for his researchassistance.

References1. J.L. Herlocker et al., Evaluating Collaborative Filtering Recommender

Systems,ACM Trans. Information Systems, vol. 22,no. 1, 2004,pp. 553.

2. B. Sarwar et al., Analysis of Recommendation Algorithms for E-com-merce, Proc. 2nd ACM Conf. Electronic Commerce (EC 00), ACMPress, 2000, pp. 158167.

3. U. Shardanand and P. Maes, Social Information Filtering: Algorithmsfor Automating Word of Mouth, Proc. Human Factors in ComputingSystems Conf. (CHI 95), ACM Press, 1995, pp. 210217.

4. G. Adomavicius and A. Tuzhilin, Toward the Next Generation of Rec-ommender Systems:A Survey of the State-of-the-Art and Possible Exten-sions,IEEE Trans. Knowledge and Data Eng., vol. 17, no. 6, 2005, pp.

734749.

5. J. ODonovan and B. Smyth, Trust in Recommender Systems, Proc.10th Intl Conf. Intelligent User Interfaces (IUI 05), ACM Press, 2005,pp. 167174.

6. D.F. Duhan et al., Influences on Consumer Use of Word-of-Mouth Rec-ommendation Sources,J. Academy of Marketing Science, vol. 25, Fall1997, pp. 283295.

7. T.S. Robertson, J. Zielinski, and S. Ward, Consumer Behavior, Scott,Foresman and Co., 1984.

8. T. Riggs and R. Wilensky, An Algorithm for Automated Rating ofReviewers, Proc. 1st ACM/IEEE-CS Joint Conf. Digital Libraries(JCDL 01), ACM Press, 2001, pp. 381387.

9. P. Massa and P. Avesani, Trust-Aware Collaborative Filtering for Rec-ommender Systems, On the Move to Meaningful Internet Systems 2004:CoopIS, DOA, and ODBASE, LNCS 3290, Springer, 2004, pp. 492508.

10. Y.H. Cho, J.K. Kim, and S.H. Kim, A Personalized Recommender Sys-tem Based on Web Usage Mining and Decision Tree Induction,ExpertSystems with Applications, vol. 23, no. 3, 2002, pp. 329342.

For more information on this or any other computing topic, please visit ourDigital Library at www.computer.org/publications/dlib.

T h e A u t h o r s

Jinhyung Cho is an assistant professor in theDongyang Technical Colleges Department of

Computer and Information Engineering and aPhD candidate in the Seoul National UniversitysInterdisciplinary Graduate Program of Technol-ogy and Management. His research interests in-clude Web personalization, e-business, socialcomputing, and knowledge management systems.He received his MS in computer engineering from

the Korea Advanced Institute of Science and Technology. Contact him atthe Dept. of Computer and Information Eng., Dongyang Technical Col-lege, 62-160 Kochuk-Dong, Kuro-Gu, Seoul, 152-714, Korea; [email protected].

Kwiseok Kwon is an assistant professor in theAnyang Technical Colleges Department of E-business and a PhD candidate in the SeoulNational Universitys Interdisciplinary Graduate

Program of Technology and Management. Hisresearch interests include Web personalization,new-service development, and the Semantic Web.He received his MS from the InterdisciplinaryGraduate Program of Technology and Manage-

ment. Contact him at the Dept. of E-business, Anyang Technical College,San 39-1,Anyang 3-Dong, Manan-Gu,Anyang, Gyeonggi-Do, 430-749,Korea; [email protected].

Yongtae Park is a professor in the Seoul NationalUniversitys Department of Industrial Engineeringand served as the director of SNUs Interdiscipli-nary Graduate Program of Technology and Man-agement. His research interests include knowl-edge network analysis and online-service creation.He received his PhD in operations management

from the University of Wisconsin-Madison. Con-tact him at the Dept. of Industrial Eng., Seoul

National Univ., San 56-1, Shillim-Dong, Kwanak-Gu, Seoul, 151-742,Korea; [email protected].

dual hybrid

Documents