seed by example - final...seed by example: a conjoint solution to the cold-start problem in...
Post on 02-Aug-2021
2 Views
Preview:
TRANSCRIPT
SEED BY EXAMPLE:
A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN
RECOMMENDER SYSTEMS
BY
BART VERSCHOOR
date of completion:
22-06-2015
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
2
SEED BY EXAMPLE:
A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN
RECOMMENDER SYSTEMS
BY
BART VERSCHOOR
Author:
Bart Verschoor (S2585855)
b.b.verschoor@student.rug.nl
President Kennedylaan 244-I
1079 NV, Amsterdam
The Netherlands
+31 (0) 64 15 73 35 74
1st supervisor:
Dr. F. Eggers
f.eggers@rug.nl
Nettelbosje 2
Duisenberg Building (DUI321)
9747 AE, Groningen
The Netherlands
+31 (0) 50 363 70 65
2nd supervisor
Drs. N. Holtrop
n.holtrop@rug.nl
Nettelbosje 2
Duisenberg Building (DUI310)
9747 AE, Groningen
The Netherlands
+31 (0) 50 363 9621
Master Thesis
Date of completion:
08-06-2015
University of Groningen
M.Sc. Marketing (Intelligence profile)
Department of Marketing
Faculty of Economics and Business
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
3
1 Preface This master thesis is an original intellectual product solely created by the author, Bart
Verschoor. This master thesis is part of the master program marketing (intelligence
profile) at the University of Groningen. It is submitted in order to fulfill graduation
requirements set forth by the university. The master thesis was developed between
February and June of 2015. Major parts of the text are based on research from others. I
went to great lengths to provide accurate references to these sources. No intended or
unintended plagiarism occurred nor have any parts been published before. The treatment
of the participants was in accordance with the ethical standards of APA. Furthermore, the
Javascript syntax and HTML/CSS markup was written by me. I would like to thank the
open source community for providing the technologies that enabled this study. In
particular, the work of Guy Morita on the Javascript library recommendation raccoon was
instrumental to this study.
1.1 About the author
In my spare time I spend a lot of time reading up on technology, in particular on artificial
intelligence techniques. There is a certain magic quality to the notion of automating an
autonomous intelligent system. Inspired by my brother Joris who is insanely adept at
programming, as well as a mixture of sci-fi novels like ‘I, Robot’ and ‘Snow Crash’ or
movies like ‘Terminator’, ‘The Matrix’, ‘Her’ and ‘Ex Machina’, the elegance of
automation has me spellbound.
Flashback to the cold winter of 2014; Prof. Dr. Felix Eggers introduced me to the
concept of conjoint analysis. Its applications for new product development intrigued me
greatly, but the labor intensive, ad hoc nature of experimental research conflicted with
my desire to develop intelligent automated algorithms such as those I had read about.
However, nearing the end of winter, I was convinced of the undeniable power of conjoint
analysis. It dawned on me that the conjoint procedure should somehow be automated in
order to scale its use to millions of real consumers as opposed to hundreds of survey
participants. As I delved into the machine learning literature I came across the research
stream of recommender systems. I noticed how conjoint analysis, a staple marketing
research tool, could address numerous challenges of recommender systems. The thought
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
4
of solving problems in one domain by applying the lessons from another domain excited
me, and I got to work on this interdisciplinary project. It was a challenging trial of my
abilities. This project provided me invaluable lessons on recommendation engines, a
technique I plan on using for a future ecommerce project.
1.2 Acknowledgements
Prior to starting this master thesis I had heard of the horror stories and tragic tales of the
mythical monster that is the master thesis. The onus of completing the master thesis is
well chronicled and it had made me cautious of its perils. Contrary to the myths told in
the hallways of the Duisenberg building, writing the master thesis was a very enjoyable
process for me. I attribute this in large part due to the pleasant cooperation with Prof. Dr.
Felix Eggers, my supervisor and honours college mentor. His kind and calm approach has
made this more than a memorable period and I will look back upon it with great
satisfaction. In particular, I would like to thank him for his patience, his calm and
collected demeanor and his uncanny expertise in conjoint methodologies. The feedback I
received was insightful, precise and critical yet at the same time kind and supportive.
Credits are due also for generously providing me with the dataset for the conjoint
experiment upon which a large part of this thesis rests as well as for providing me with
the appropriate orthogonal array. Providing this information contained the already
daunting scope of this project within reasonable bounds. Working with Prof. Dr. Felix
Eggers was an enlightening experience. In the spirit of this master thesis’ topic, I would
strongly recommend his supervision to anyone.
Thanks are also due to my dear friend and mentor Maurice Sikkink for giving me
the final push I needed to pick up programming. Without his ability and willingness to
accelerate my learning, I would not have been able to pick up Javascript, Node, Express,
Raccoon, Mongoose, MongoDB and JQuery so fast. Thanks also for inspiring me to
pursue an academic career in the first place. It has been one of the most rewarding
experiences in my life. During this time I have discovered how to harness my
unquenchable thirst for knowledge productively.
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
5
Mentorship is the greatest gift one can bestow on others. As you two have invested in me,
so I will commit to invest in others. May the ripple effects of your mentorship echo
onward, reaching lives from one degree of separation to the next.
Bart Verschoor
Amsterdam, the Netherlands
June 7th, 2015
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
6
2 Table of Contents
1 Preface ........................................................................................................................... 3
1.1 About the author ................................................................................................................. 3 1.2 Acknowledgements ............................................................................................................ 4
3 Abstract .......................................................................................................................... 8
4 Introduction ................................................................................................................... 8
5 Theoretical framework ................................................................................................ 10
5.1 Recommender systems ..................................................................................................... 10 5.2 Cold-start problem ............................................................................................................ 14 5.3 Conjoint analysis (CA) ..................................................................................................... 16 5.4 Proposed recommender system ........................................................................................ 19
6 Methodological overview ............................................................................................ 20
6.1 Conjoint utility model ....................................................................................................... 20 6.2 Collaborative filtering model ........................................................................................... 21 6.3 Methods of collaborative filtering .................................................................................... 22
6.3.1 k-NN classification .................................................................................................... 22 6.3.2 Similarity measure .................................................................................................... 23
7 Research design ........................................................................................................... 24
7.1 Research context ............................................................................................................... 24 7.2 Procedure .......................................................................................................................... 25 7.3 Conjoint experiment design .............................................................................................. 25
7.3.1 Stimuli ....................................................................................................................... 27 7.3.2 Job design .................................................................................................................. 27 7.3.3 Seed node design ....................................................................................................... 27 7.3.4 Latent classes ............................................................................................................ 32
7.4 Implementation ................................................................................................................. 34 7.5 Experimental setting ......................................................................................................... 35 7.6 Predictive validity ............................................................................................................. 35
8 Results ......................................................................................................................... 36
9 Conclusions and recommendations ............................................................................. 38
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
7
9.1 Findings ............................................................................................................................ 38 9.2 Limitations ........................................................................................................................ 39 9.3 Further research ................................................................................................................ 40
10 References ................................................................................................................. 41
11 Appendices ................................................................................................................ 48
11.1.1 Appendix 1: Custom Survey ................................................................................... 49 11.1.2 Appendix 2: Individual level cumulative hit rates per nth recommendation ........... 50 11.1.3 Appendix 3: surveyserver.js .................................................................................... 51 11.1.4 Appendix 4: jobrecsys.html .................................................................................... 55 11.1.5 Appendix 5: thesis.html .......................................................................................... 58 11.1.6 Appendix 6: jobs.js .................................................................................................. 62
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
8
3 Abstract This study aims to reduce the cold-start problem in recommender systems. In doing so,
the study also investigates whether or not a hypothetical conjoint recommender would be
viable. This study links conjoint analysis to a collaborative filter by seeding a database
with latent classes derived from an a priori conjoint experiment. The study contrasts the
performance over time of two conditions: a collaborative filter trained with latent classes
and a benchmark condition using an untrained collaborative filter. The study finds a
substantial improvement in mean hit rates by using the ad-hoc seeding strategy over not
seeding, while partially addressing the cold start problem.
Keywords: RecSys, Recommender System, Conjoint Analysis, Cold-start, Collaborative
Filter, Latent Class Analysis
4 Introduction The ever increasing burden of information overload has caused an epidemic of infobesity
amongst consumers (Adomavicius & Tuzhilin, 2005) and companies (Rogers, Puryear &
Root, 2013). With the advent of the big data trend information became a ubiquitous
commodity to be mined and analyzed. Information has exploded partly due to the
increasing connectedness of the social graph, the effects of decreasing costs of storage
leading to the growth of a ‘memory’ of the internet, distributed computing propelled by
the widespread adoption of HDFS1 (Hadoop Distributed File System), the compounding
effects of content creators whose content remains online to be shared for eternity and the
fact that there exists no such metaphorical equivalent as a garbage truck for the internet
that can periodically clean up outdated content.
On one hand, consumer-related information overload presents an opportunity for
companies to mine and synthesize behavioral data into consumer insights, while on the
other hand product-related information overload obstructs the purchase decision of
consumers. Recommender systems offer a panacea to the scourging effects of product-
related information overload by algorithmically separating signal from noise using
statistical techniques. Consumer-related information overload may fuel recommender
1 The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. (http://hadoop.apache.org/#What+Is+Apache+Hadoop%3F)
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
9
systems that in turn reduce the burden of product-related information overload by
offering fitting product recommendations. Paradoxically, despite the widespread
availability of data, newly launched recommender systems suffer from data sparsity
issues. To separate the wheat from the chaff, recommender systems ideally use input data
that is as complete as possible. Automatically matching user preferences to items requires
that the system get to know the user over time so that it can identify patterns across other
users and/or items. However, a side effect of the information abundance is that users have
become increasingly impatient towards online services (Maier, Laumer, Eckhardt &
Weitzel, 2012). Users churn if the service is not immediately useful. Herein lies the crux
of the cold-start problem: How can recommenders familiarize themselves with users from
the start, without having any information about the user’s preferences? Many partial
solutions to the cold-start problem have been proposed (Schein, Popescul & Ungar, 2002;
Zhou, Yang & Zha, 2011; Liu, 2011).
This study archaically enriches the database by seeding nodes based on an a priori
conjoint analysis to a recommender system based on collaborative filtering2. This may
benefit companies that wish to launch new recommender systems without haphazardly
altering the intricate subtleties of proven recommender techniques.
A conjoint approach to the cold-start issue may result in recommender systems that can
better predict product-user fit for new users in the early stages of the user lifecycle.
Decision aids create a better customer experience (Xiao & Benbasat, 2007), which in turn
affects the bottom line of the firm. As data grows, recommender systems will gain a more
prominent important role in the reduction of choice overload. User retention may be
improved by reducing the cold-start problem. Better product recommendations early on
in the customer lifecycle may prevent users from churning before positive word-of-mouth
effects from new users can cascade into more widespread service adoption.
The study contributes to firms by proposing a data enrichment extension that
helps reduce the cold-start problem without having to alter previously implemented
algorithms. This is noteworthy because many variations of recommender systems exist. It
is not required to redesign each individual algorithm to implement the improvement. The
methods proposed in this study are broadly applicable.
2 This refers to training the algorithm by uploading special cases to the database, used to make predictions.
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
10
Literature on methodological issues on recommender systems is mostly
concentrated in the field of information retrieval, computer science, machine learning and
artificial intelligence (see next chapter for a review). This study attempts to answer the
methodological challenge of the cold-start problem by using an interdisciplinary
approach, namely by integrating a technique common to marketing researchers and
applying it recommender systems. Recommender systems belong in the burgeoning field
of marketing automation, which is at the intersection of computer science, statistics and
marketing, thus viewing its methodological problems from a marketing background
brings a refreshing perspective, which surprisingly few studies have attempted in the past.
Secondly, this study could be a stepping-stone toward a fully automated conjoint-based
recommender system.
After a brief introduction to the focal topics and relevant techniques in this paper,
a literature overview covering methodological issues and comparisons between
recommenders is discussed, followed by a methodology section addressing the particular
study design and implementation. Afterwards, the results of the study are reviewed,
closed by a concluding discussion in which the implications of the findings are provided,
while taking note of the limitations and recommendations for future research are
discussed.
5 Theoretical framework
5.1 Recommender systems
Recommender systems have gradually solidified its place in online technologies
during the last 20 years. Although early pioneers such as MIT Media Labs spin off
Firefly (Oakes, 1999) and Bellcore’s Video Recommender (Hill et al., 1995) didn’t
survive the dotcom bubble, the technology persisted. Today, recommender systems are
becoming an increasingly important driver of business performance (Thompson, 2008)
(Lamere P., Green S., 2008), connecting users to relevant information both faster and
with more accuracy. A recommender system is an online decision aid that aims to reduce
information overload (Chen, Wu & He, 2013). The goal is to expose and recommend
users to information that would have otherwise been overlooked. Recommender systems
answer the need to balance the cognitive trade-off of finding the right product without
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
11
spending too much time searching for that product. The basic concept is that the better
the underlying information filter can connect items characteristics to user preferences, the
better the recommendation becomes, which results in a better customer experience and
business performance of the firm. This explains the widespread adoption of recommender
systems by corporations across a multitude of industries. There are three main techniques
in recommender systems. Refer to table 1 for an overview of recommendation
techniques.
Collaborative filtering (CF) ‘crowd sources’ the recommendation task by
clustering into different groups people who show similar preference behavior. The adage
‘birds of a feather flock together’ is the core assumption upon which collaborative
filtering rests. It recommends to likeminded people products that other group members
have purchased before them. CF segments the market using a similarity measure upon
which it bases its predictions. CF is built solely upon the similarities between people and
ignores individual product features. As preferences are revealed by the behavior of
customers, the CF dynamically changes. Due to its social character CF can offer
serendipitous results. Another advantage of CF is that it is minimally affected by ramp-up
problems3. This means that it is computationally efficient to run on large scale databases
(Beutel, Weimer & Minka, 2014). This makes CF suited for real-life production
environments and explains its popularity amongst firms. However, CF requires a
feedback mechanism (likes or purchases) in order to work. This makes the CF sensitive
to data sparsity issues both for new items and for new users. This is often referred to as
the cold start problem.
Content-based recommenders (CB) build a user profile consisting of product
feature preferences that are used to make predictions. It learns from users by taking note
of feature preferences. For instance, in the case of movies, these include movie genre,
actors and director. The CB compares previously liked or purchased products to products
with a similar score on that specific mix of features. CBs assume that, for instance, if you
liked Quentin Tarantino’s Reservoir Dogs, Pulp Fiction and Kill Bill, you will probably
also like Django Unchained and Inglorious Bastards. One caveat of CB is that it requires 3 Ramp up refers to the incremental reassignment of computational containers (fractions of machines) as they become available. This becomes relevant when big resource requests cannot be fulfilled due to over-utilized clusters. Instead of fulfilling one big request, the CF is continuously computed on a rolling basis.
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
12
content-descriptors that state the product’s features. Text mining algorithms such as TF-
IDF4 remedy this problem. Using CB often lead to results that are not very surprising.
This is because the CB often keeps suggesting very similar products only. The problem
here is that the main goal of recommender systems is to predict what type of product the
user would like or purchase, not necessarily to predict which products are similar to
previously liked or purchased. Content-based recommenders suffer from cold start
problems since user preferences are unknown initially. Unlike CF, CB does not rely on a
community of users for its predictions. This results in a similar performance of the CB
regardless of the activity and number of users feeding information to the recommender
system.
Knowledge-based recommenders are hard-coded rules determined on the outset
by a domain expert. The quality of the recommender system is contingent on the
knowledge of the domain expert. Knowledge-based recommendations are very consistent
due to its rigidity, but they do not evolve based on the behavior of users. Unfortunately,
they do not capture market trends and the rules can become outdated quickly. This
limited staying power is due to the static nature of knowledge-based recommenders and
the dynamic nature of markets. Since knowledge-based recommenders do not rely on
user behavior, it does not suffer from the cold-start problem.
Hybrid methods combine content-based recommendation and collaborative
filtering by applying both. It refers to a multiple classifier system that selects whether to
use the prediction of the content-based recommender or the collaborative filter by means
of a selection filter, usually some sort of weighting scheme. This mixed approach is often
riddled in complexity making it difficult to design and implement. However, the
combination of CF and CB is very potent. Companies that rely on recommender systems
are often willing to adopt hybrid recommenders because it delivers great results.
4 TF-IDF refers to term frequency-inverse document frequency. It is a metric that states the importance of a keyword in one document compared to a collection of documents.
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
13
Pros Cons
Collaborative Filtering • Negligible ramp up effort
• Results are serendipitous
• Learns market segments
• Dynamic model
• Requires rating feedback
• Cold start for new users
• Cold start for new items
Content-Based • No community required
• Comparison between item
characteristics
• Dynamic model
• Content-descriptors necessary
• Cold start for new users
• No serendipity
Knowledge-based • Deterministic
recommendations
• Consistent quality of
recommendations
• No cold-start
• Knowledge engineering required
• Doesn’t respond to trends
• Static model
Hybrid • Combines an arbitrary
amount of CF and CB
techniques to leverage pros
and offset cons
• Uses selection algorithm
• Difficult implementation
Table 1: Taxonomy of recommender systems
Within each of these groups, a myriad of design tweaks can be taken into account,
such as picking the appropriate similarity measure. Whichever recommender system one
decides to adopt, the basic concept is that recommender systems automate the prediction
process of items based on observed preferences from users by estimating utility levels
and sorting groups of similar items and/or people together.
Some notable examples include LinkedIn’s item based collaborative filtering
platform called Browsemaps, (Wu et al., 2014), Netflix’s ALS-WR, an abbreviation of
Alternating Least Squares with Weighted λ-Regularization (Y. Zhou, Wilkinson,
Schreiber & Pan, 2008), Amazon’s item-item collaborative filter (Linden, Smith & York,
2003), Hackernews’ ranking algorithm (Salihefendic, 2010), Reddit’s Hot Ranking
(Salihefendic, 2010) and Foursquare’s Explore (Moore, 2011).
As is evidenced by the preceding examples, there are many variations between
information filters. For instance, some recommenders take into account shilling attacks
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
14
from malicious users such as hotels that manipulate travel sites in order to increase their
bookings (Chirita, Nejdl & Zamfir, 2005) while some recommenders aim to increase
serendipity such that recommendations are not only accurate, but also surprising
(Kaminskas & Bridge, 2014). The main techniques used can be largely categorized into
three groups, namely: (1) content-based recommenders, (2) collaborative filters, (3)
knowledge-based recommenders and (4) hybrid recommenders as shown in table 1
(Jannach & Friedrich, 2011). The term collaborative filter was original coined at Xerox
(Goldberg, Nichols, Oki & Terry, 1992) for use in its mail service Tapestry. The
Grouplens Project developed at the University of Minnesota (Resnick, Iacovou &
Suchak, 1994) developed a Usenet news client that supported collaborative filtering,
notably improving Tapestry. Other early uses include music recommendation service
Firefly, a spinoff from MIT’s project Ringo (Shardanand, 1994), PHOAKS (Terveen et
al., 1997) and the historically significant web browser Mosaic. Content-based
recommenders have its roots in information retrieval and cognitive filtering (Morita &
Shinoda, 1994; Konstan & Recommender, 2004). Hybrid recommenders combine various
recommendation techniques and filter the best option (Adomavicius & Tuzhilin, 2005).
5.2 Cold-start problem
The cold start problem can arise in systems that require automated data modeling. The
cold start problem, also known as the sparsity problem, refers to the inability of an
automated system to provide meaningful recommendations due to a lack of data on which
the system can draw inferences. Although this could happen in various information
systems, recommender systems are especially prone to it (Masthoff, 2011). For example,
an ecommerce firm updates the web shop with a new item. This item has zero ratings,
which results in the item never being recommended to other users. Since user similarity
can only occur when at least one of the two users compared against each other rated the
item, those items with zero ratings will never be recommended. The cold start problem
also affects new users. Imagine a web shop with an extremely large inventory of different
items, for instance a web shop using drop shipping (Chopra, 2003) and a modest number
of users. It could happen that no overlap exists between the items rated of two different
users (Mathematically referred to as set difference). In this case, the recommender system
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
15
also suffers from the cold start problem. One more occurrence of the cold start problem
exists. Imagine a newly launched web shop with no users. The quality of the
recommender system will be poor for the web shop’s first users because the
recommender system is still untrained. Yet again, the cold start problem arises potentially
leading to customer churn. This may or may not be critical depending on how strongly
the e-tailer relies on decision aids.
Content-based recommenders (CBR) must first construct a sufficiently detailed
model of the user’s tastes and preferences by querying or observing user behavior, for
instance by taking into account likes, ratings or purchase history. Before the
recommender system can make recommendations with some degree of intelligence, the
user must train the recommender system by revealing his/her preferences. Once the
system has created a sufficiently detailed user profile, the recommender system is able to
perform as it is intended.
Collaborative filters (CF) identify users who share similar rating patterns.
Collaborative filters suggest favored items to users that share a similar rating pattern but
only for those products that the active user has not seen yet (e.g. missing values). Without
other cluster members, the collaborative will filter fail to recommend products. The cold-
start problem makes it so that unrated items will never be recommended; a major
limitation of this technique.
Knowledge-based recommenders (KBR) originate from the domain of case-
based reasoning (Kolodner, 1983; Kübler, 2005; Shank, 1999). It involves having a set of
examples and a way to search a database for those examples to find relevant solutions to
similar problems in the past. Knowledge-based recommenders are systems that use
explicit knowledge about item attributes and user preferences a priori. This explicit
knowledge is hardcoded into a body of deterministic rules upon which recommendations
are suggested. Traditionally, knowledge-based recommenders are used when
collaborative filtering and content-based filtering are inappropriate. Its main advantage is
the absence of the cold-start problem. However, knowledge-based recommenders require
knowledge engineering, are intrinsically static and do not react to short-term trends.
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
16
5.3 Conjoint analysis (CA)
The collaborative filtering procedure concerns itself with the estimation of utility
levels based on the notion that item choices of users can be extrapolated to other users
who show similar preference behavior patterns. Choice-based conjoint (CBC) analysis
takes on a different approach. Where recommender systems aim to solve the estimation
of missing values for item ratings, CBC analysis aims to estimate the utilities of attribute
levels given an arbitrarily large set of hypothetical item combinations. It does so by
asking participants to make successive trade-offs between products. This study applies
CBC analysis. However, like recommender systems, there are many conjoint variants that
differ in their approach to estimating the utility of a product. A taxonomy of these
techniques can be formulated as (1) decompositional methods (e.g. ranking-based
conjoint (Green & Srinivasan, 1989), choice-based conjoint (Louviere & Woodworth,
1983), transaction-based conjoint (Ding, Park & Bradlow, 2009; Netzer et al., 2008), (2)
compositional methods (e.g. self-explicated method (Green & Srinivasan, 1989), paired
comparisons (Scholz, Meissner & Decker, 2010), adaptive self-explicated method (Efrati,
Lin, Toubia, Hebrew & Colloquium, 2007)) and (3) hybrid methods (e.g. ACA (Green,
Krieger & Bansal, 1988), ACBC (Chapman, Alford, Johnson, Weidemann & Lahav,
2009) and HIT-CBC (Eggers & Sattler, 2009)). The decompositional approach
statistically decomposes the attribute level preferences from overall product evaluation.
Choice-based conjoint (CBC) Products are described as a composition of
features called attribute levels. The estimation reveals the relative importance of each
attribute in order to identify the most appealing combination of attribute levels. Utility
levels are estimated based upon which choice options of the presented choice sets are
preferred by the respondents. In CBC, choices are modeled as dependent variables. CBC
allows for the inclusion of a no-choice option. This can be used to determine what
product requirements make up the threshold for purchase consideration, or market
relevance. A notable advantage of CBC is that it includes interaction effects. The
synergies between certain attribute levels are made explicit this way. These effects occur
when ‘the whole is greater than the sum of its parts’. One disadvantage is that CBC
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
17
requires rather complex designs, which require balanced, orthogonal arrays5. One must
take precautions with experiments since participants may resort to reduction strategies
when overwhelmed with choices. The number of choice sets and the information density
of each product presented must be limited to prevent wear out effects from occurring.
Ranking-based conjoint (RBC) In RBC, participants are asked to rate product
on an ordinal scale. It makes intuitive sense to use ordinal ratings both for conjoint
analysis and for collaborative filtering, since we are interested in knowing which
products are most preferred. However, several problems plague RBC. One major problem
is that the distance between different items is assumed to be equal. The limited
information RBC provides limits its applied use greatly. For instance, consider the
following example of a preference comparison between three cars. Imagine that the most
preferred option is a Bugatti Veyron, the second most preferred option is a McLaren SLR
and the least preferred option is a Fiat Panda. RBC assumes that the distance in
preference between a Bugatti Veyron and a McLaren SLR is the same as the difference
between a McLaren SLR and a Fiat Panda. Another problem is that RBC lacks a no-
choice option, which makes it impossible to compute a consideration threshold.
Transaction-based conjoint (TBC) uses barter methods that facilitate ‘trade’
under challenging market conditions. Barter methods simulate markets where participants
(buyers and sellers) respond to barter offers made by other participants. The social nature
of a market simulation where goods are exchanged makes TBC effective in capturing rich
information. Price information is especially well captured, because unlike the discrete
nature of CBC price levels, buyers and sellers set prices based on their own judgment.
Contrary to CBC’s no-choice option used to derive the minimum requirements for
consideration, TBC is designed such that less-desirable products are also priced and
traded. TBC is impacted by the endowment effect because it is a simulation of buyers and
sellers. The endowment effect is a bias that states that people value items more based
merely on the fact that they possess them (Kahneman et al., 1990). Although some
studies have shown that barter methods outperform CBC (Ding, Park & Bradlow, 2009),
barter methods take considerable implementation and participant coordination effort.
Table 2 summarizes the decompositional conjoint methods in a brief taxonomy.
5 http://support.sas.com/techsup/technote/ts723.html
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
18
Methods Pros Cons
Rat
ing-
base
d co
njoi
nt
• Participants asked to
rate products on an
ordinal scale
• Easy to implement
• Using ordinal ratings in both
CA and CF makes sense
• Distance between rated
items assumed equal
• Lacks ‘no-choice’ option
• Difficult to interpret
Cho
ice-
base
d co
njoi
nt
• Choices as dependent
variables
• Asks for most
preferred option
between several
alternatives
• Allows choice predictions
• Includes consideration with
‘no-choice’ option. Signifies
expected demand decrease
• Includes interactions
between attributes ‘whole
greater than sum of parts’
• Hierarchical Bayesian
estimation allows individual
level estimation of part-
worth utilities6
• Complex design
• Participants resort to
reduction strategies
when faced with choice
overload (Wear out
effects)
Tran
sact
ion-
base
d co
njoi
nt
• Uses barter methods
to facilitate ‘trade’
under adverse
conditions
• Market simulation
where respondents
digitally trade products
• Dynamic customization
based on participant’s
responses and outcomes
• Collects substantially more
information with limited
wear out
• Less desirable offers also
addressed with negative
offers
• Complex design
• Participants’ outcomes
are dependent on the
outcomes of other
participants
• Prone to loss-aversion
bias (Kahneman &
Tversky, 1984)
Table 2: Taxonomy of decompositional conjoint methods
The fundamental difference between recommender systems and conjoint analysis
is that conjoint analysis is a non-automated approach based on eliciting preferences from
survey participants Conjoint analysis and latent class analysis are traditionally used in
marketing research, particularly for segmentation analysis and new product development
where obtaining the optimal feature configuration, willingness to pay and estimating the
brand premium are crucial for the competitive performance of firms.
6 http://www.sawtoothsoftware.com/index.php?option=com_content&view=article&id=167
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
19
According to Kramer (2007), the accuracy of recommender systems is influenced
by ‘task transparency’, ergo the predictive strength of recommender systems increase as
users gain a better understanding of how the recommendations were derived. Conjoint
methods provide a more intuitive interface for users that may take advantage of this bias
towards task transparency. This paper takes a step towards creating a recommender based
on the conjoint methodology.
5.4 Proposed recommender system
The proposed recommender is a collaborative filter. Latent classes are seeded into the
database obtained from an a priori preference measurement survey upon which conjoint
analysis is applied. CF is chosen for this study because it derives inferences from similar
users. In the case of CBR, seeding the database is not useful because CBR is not based on
the premise of user similarity. Although hybrid recommenders are arguably the best
performing recommender systems, using one in the context of this study would
unnecessarily overcomplicate the study in terms of analysis and implementation.
The methodology proposed in this paper suggests to seed information obtained
from a conjoint analysis (CA) containing user-item preferences to a database upon which
collaborative filtering (CF) is applied. This pre-loaded information contains seeded
representative profiles from a latent class analysis subsequently referred to as seed nodes.
The seed nodes provide the initial nodes upon which the CF infers correlations for new
users. Contrary to KBR, this approach does not restrict the system to static, deterministic
rules. The advantage of this methodology is that, as the taste profiles of users become
more complete (e.g. number of users and the number of ratings per user grow), users tend
to correlate more strongly to each other than to seed nodes, whereas at the start, users
correlate strongly to the seed nodes. In a sense, the seed nodes act as a set of training
wheels to which the CF can compare new users against. Since conjoint analysis can
calculate every hypothetical item, the seed nodes provide complete information that can
be used to segment new users efficiently based on their preference behavior patterns.
Conjoint analysis is appealing because it is possible to derive preference estimates
for any hypothetical product. In CA, each product is considered as a set of features of
which utility estimates can be obtained. The utility of every hypothetical product (defined
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
20
as a feature combination) can be calculated, which means the seed nodes will have zero
missing values. The seed nodes populating the database are able to guide the CF by
enabling it to recommend products that do not have any ratings. This is particularly
useful for e-commerce databases that contain niche, long-tail products or job databases
that contain a stream of job postings that can expire.
In summary, the study attempts to utilize the benefits of CF (e.g. dynamics,
serendipitous results, integrating social environment) while offsetting the disadvantage of
CF (e.g. cold-start problem).
6 Methodological overview
6.1 Conjoint utility model
CBC regresses the various attribute levels (independent variable; e.g. marketing manager,
€2500) on the product choice (dependent variable) in such a way that the utility of a
product is a linear combination of part-worth utilities. Utility estimates can be further
optimized using vector specification or ideal points specification for each attribute and
then comparing the predictive results of the utility sum using likelihood ratio tests. The
random utility function is formulated in equation 1 and 2 (Eggers, 2014):
𝑢!" = 𝑉!" + 𝜀!" (1)
Where:
𝑛 = user
𝑖 = product (item or job)
𝑉!"= systematic utility component (explained utility) of consumer 𝑛 for product 𝑖
𝑢!" = utility of consumer 𝑛 for product 𝑖
𝜀!"= stochastic utility component (disturbance term) of consumer 𝑛 for product 𝑖
𝑉!" = 𝛽!"𝑥!"
!
!!!
(2)
Where:
𝑘 = 1,… ,𝐾 number of attributes
𝑥 = dummy indicating the specific attribute level of product 𝑖
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
21
𝛽 = part-worth utility of consumer 𝑛 for attribute 𝑘
Choice-based conjoint is based on the nonlinear multinomial logit model (Kuhfeld,
2010). Here, a reverse transformation is done so that the discrete dependent variable
choice is changed into a continuous scale such that a value can be calculated for each
possible choice.
𝑝 𝑖|𝐽 =
𝑒𝑥𝑝 𝑉!𝑒𝑥𝑝 𝑉!!
!!! (3)
6.2 Collaborative filtering model
The core problem recommender systems aim to solve is the estimation of missing values
for item choices. These missing ratings are items that have not yet been shown to the
active user. Consider the following where:
N = set of all users
I = set of all items (e.g. jobs)
u = utility function
u: N × I → R, where R is an ordered set of real numbers within a certain range.
Such that each user n ∈ N choose i’∈ I that maximizes the user’s utility.
∀𝑛 ∈ 𝑁, 𝑖!! =
argmax𝑢(𝑛, 𝑖)𝑖 ∈ 𝐼 (4)
Each user n in the set N can be defined by a profile that contains various attributes
pertaining to the user’s personal information such as sex, age, birthdate, email address,
and so forth. Likewise, each item i in the set I can be defined by a set of product features.
The utility u is often only defined on a subset of N × I, thus the utility u should be
extrapolated across the entire set N × I. The task of a recommender system is to come up
with estimations for utilities that contain missing values. After estimation the item i with
the highest utility is shown to the active user, according to equation 4 (Adomavicius &
Tuzhilin, 2005).
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
22
6.3 Methods of collaborative filtering
6.3.1 k-NN classification
In collaborative filtering, the utility 𝑢(𝑛, 𝑖) of item 𝑖 of user 𝑛 is estimated based on the
utilities 𝑢(𝑛! , 𝑖) assigned to item 𝑠 by those users 𝑛! ∈ 𝑁 who are similar to user 𝑛.
Various similarity measures can be applied. The focus of this paper will be on using the
k-Nearest Neighbors classifier in conjunction with the Jaccard similarity coefficient. The
k-Nearest Neighbor classifier, formulated in equation 5, finds the closest 𝑘 neighbors of
each user given the similarity coefficient (Wen, 2008), formulated in equation 7. This
method can be applied successfully in real world situations because the algorithm is
efficient in its computational resource consumption since it only compares against the
closest 𝑘 neighbors, instead of the entire database.
𝑃!,! =
𝑖′ ∈ 𝑁!!(𝑖)𝑠𝑖𝑚 𝑖, 𝑖′ 𝑅!!,!𝑖′ ∈ 𝑁!!(𝑖)| 𝑠𝑖𝑚(𝑖, 𝑖′)
(5)
Where:
𝑁!! 𝑖 = {𝑖!: 𝑖! 𝑏𝑒𝑙𝑜𝑛𝑔𝑠 𝑡𝑜 𝐾 𝑚𝑜𝑠𝑡 𝑠𝑖𝑚𝑖𝑙𝑎𝑟 𝑖𝑡𝑒𝑚𝑠 𝑖 𝑎𝑛𝑑 𝑢𝑠𝑒𝑟 𝑛 𝑐ℎ𝑜𝑠𝑒 𝑖′}
𝐾 = top 5 nearest neighbors
𝑠𝑖𝑚(𝑖, 𝑖′) = the binary Jaccard similarity coefficient in equation 7
𝑅!!,! = the existing choices of user 𝑛 on item 𝑖′
𝑃!,! = the prediction for user 𝑛 on item 𝑖
Where 𝑠𝑖𝑚 𝑛,𝑛! can be any similarity measure7, which is used to differentiate between
user similarity. In the implementation of this study, the Jaccard similarity coefficient is
calculated for each user 𝑛. Then, a k-Nearest Neighbors classifier compares users only to
7 Some examples to choose from include using ant-based clustering (Nadi, Saraee, Jazi & Bagheri, 2011), K-Means or Fuzzy K-means clustering (Kim, 2003), genetic algorithms (Bobadilla, Ortega, Hernando & Alcalá, 2011), Naïve Bayesian Classifiers (Pronk et al., 2007), cosine vector similarity, Artificial Neural Networks (Mannan, Sarwar & Elahi, 2014) or different types of correlation metrics such as Pearson correlation, Constrained Pearson correlation and Spearman rank correlation (Ekstrand, 2010, p. 93).
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
23
their top 5 nearest neighbors derived from a sorted list, which computes the final
recommendation. The number of nearest neighbors (five) of the kNN classifier has
implications on the latent class analysis that is discussed later on in this paper. To
distinguish between segments properly, the database must consist at least of five
segments for a fair comparison. If there exist less than five users in the database, a new
user will be compared against less than five nearest neighbors. This will result in a minor
reduction in predictive power for the first users.
6.3.2 Similarity measure
The Jaccard similarity coefficient is a metric that represents the similarity between users
and k-nearest neighbors by dividing the intersection with the union. This method is
widely used in practice. In 2009, Youtube abandoned its 5-star rating system for a binary
like/dislike system based on the Jaccard similarity coefficient (Rajaraman, 2009). The
change was implemented because users rated videos either with five stars or one star, but
hardly ever with two, three or four stars. The Jaccard similarity coefficient is formulated
equation 6.
𝐽 𝑛,𝐾 =
𝑛 ∩ 𝐾𝑛 ∪ 𝐾 (6)
Where:
𝑛 = user
𝐾 = top 5 nearest neighbors
Given user 𝑛 and the users that comprise the k-nearest neighbors 𝐾, each with 𝑥 binary
choices, the Jaccard coefficient measures the overlap between the choices of user 𝑛 and
kNN 𝐾. Choices are binary, thus each choice of user 𝑛 and KNN 𝐾 can either be 0 or 1
(e.g. no or yes, like or dislike). The complete combination of choices 𝑥 between user 𝑛
and kNN 𝐾 are:
𝑀!!= Total choices where user 𝑛 and kNN 𝐾 are both 1.
𝑀!" = Total choices where the choice of user 𝑛 is 0 and the choice of kNN 𝐾 is 1.
𝑀!" = Total choices where the choice of user 𝑛 is 1 and the choice of kNN 𝐾 is 0.
𝑀!! = Total choices where user 𝑛 and kNN 𝐾 are both 0.
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
24
Equation 7 shows the binary Jaccard similarity coefficient between user 𝑛 and the kNN 𝐾
(Tan, 2007).
𝐽 =𝑀!!
𝑀!" +𝑀!" +𝑀!! (7)
7 Research design
7.1 Research context
The proposed methodology is applied to the prediction of job preferences
amongst students. Job recommenders provide an interesting context for two reasons.
Firstly, prior to embarking on a professional career, student profiles are reasonably
homogenous given the fact that students in the same field pursue similar academic
degrees. Students often have limited work experience that could more distinctly
differentiate one another. This is exemplified by the growing need for students to
differentiate themselves with extracurricular activities (Caplan, 2011) in order to become
attractive to top-tier firms (e.g. McKinsey, Boston Consulting Group, MorganStanley,
Goldman Sachs) and university admission officers (Allison, 2012). Lack of professional
experience implies that students have inherently incomplete profiles. Recommender
systems are capable of estimating missing values, in this case job preference regardless of
prior job experience.
Secondly, students often don’t know exactly what type of job they wish to apply
for after they graduate. Therefore there exists a clear need for good recommendations
amongst students. Landing the right job is important for anyone, but especially so for
fresh graduates. For fresh graduates, their job choice is either a springboard that launches
their professional careers on a trajectory for success or one that inhibits or dashes the
student’s professional aspiration.
Jobsites make use of recommender systems in order to match job requirements of
vacancies to the skillsets and preferences of its users. The ability to leverage
recommendation technology separates successful job sites from unsuccessful jobsites.
Jobsites range from general-purpose job boards (e.g. Linkedin.com, Monster.com,
HotJobs.com), niche job boards (e.g. Dice.com, Erexchange, Angellist.co), E-recruiting
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
25
application services providers (e.g. PeopleClick), E-recruiting consortium (e.g.
DirectEmployers.com) to corporate career websites (Al-Otaibi, 2012). This study
implements the proposed solution in a simulated experiment that could be applicable to
any professional career site using recommender systems.
7.2 Procedure
The study design is split up in several parts. Chapter 7.3 covers the design of the
conjoint study and its subsequent latent class analysis with the goal of creating the seed
nodes. Chapter 7.4 covers how the collaborative filter was implemented and how the
latent classes were integrated. Chapter 7.5 covers the experimental design where students
were tasked to evaluate jobs using the recommender system and chapter 7.6 covers how
the predictive accuracy is measured.
7.3 Conjoint experiment design
The author of this study obtained permission to use the results of an employer choice
survey dataset from Dr. Eggers. The sample (N = 158) consists of 51% males and females
49%, with age distributed around the mean (M = 23.14, S = 1.597). The sample consists
wholly of students following the graduate level course Marketing Engineering at the
University of Groningen in the Netherlands. This employer choice survey gathered
information about job preferences at the start of the course in early November 2013 and
2014. Students were incentivized to participate by awarding them an additional .3 grade
points on their grade for an assignment. Incentive alignment has shown to improve
predictive accuracy and reduce hypothetical bias (Eggers & Sattler, 2011). Demographic
information asked prior to choice elicitation included the respondent’s age, work
experience in years, gender as well as whether or not the respondent lives in or has
friends and family from any of the four locations used in the survey. The students were
provided a fractional factorial conjoint choice elicitation task that was based on a
randomized orthogonal array that controls for the efficiency criteria of balance and
orthogonality (Eggers, 2015). It consisted of 12 choice sets each containing 3 alternatives
and a none-option as per a combination of attribute levels in table 3. The chosen stimuli
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
26
are relevant since students are nearing graduation and must actively seek jobs. The
hypothetical jobs are related to the curriculum of the graduate program.
Model specification design for conjoint analysis requires the testing for different
combinations of attribute utility measures (linear, quadratic or nominal), in order to find
the model with the best model fit and predictive power. Therefore, different models must
be compared manually. The criteria used to assess the final model are the Pseudo-R2,
adjusted pseudo-R2, hit rate and MAE (mean absolute error) as well as the information
criteria AIC and AIC3. The final model is than estimated using latent class analysis; a
preference based segmentation method for conjoint analysis. Each segment is comprised
of utility values for each attribute. This assumes that utilities are distributed across
participants who belong to discrete segments that vary in its preference patterns (latent
class). Respondents are classified into segments with a probability. These segments will
become the initial seed nodes; a special node in the database that is representative of the
preferences for an entire segment, upon which the collaborative filter will make
inferences.
Position
o Market researcher in a research company
o Market researcher within an organization
o Product manager
o Management consultant
Location
o Groningen
o Amsterdam
o Rotterdam
o The Hague
Company Size
o 50 employees
o 150 employees
o 500 employees
o 1500 employees
Holidays per Year
o 20 days
o 25 days
o 30 days
o 35 days
Income
o € 2489.-
o € 2766.-
o € 3042.-
o € 3180.-
Table 3: Attribute levels for conjoint design
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
27
7.3.1 Stimuli
Several prerequisite steps must be completed prior to starting the experiment, namely:
1. Determine the number of jobs and the selection of the job characteristics
2. Design the seed nodes using latent class analysis
3. Implement recommendationRaccoon, a javascript library for collaborative
filtering
4. Implement a custom web survey, seed the database with the seed nodes and
integrate the collaborative filter
7.3.2 Job design
The jobs that are to be recommended during the experiment were designed based an
orthogonal array for five 4-level attributes of size sixteen provided by Dr. Eggers. A full
overview of the jobs is shown in table 4. •j1 = {Market researcher in a research company, Groningen, 50, 20, €2489} •j2 = {Market researcher in a research company, Amsterdam, 150, 25, €2766} •j3 = {Market researcher in a research company, Rotterdam, 500, 30, €3042} •j4 = {Market researcher in a research company, The Hague, 1500, 35, €3180} •j5 = {Market researcher within an organization, Groningen, 150, 30, €3180} •j6 = {Market researcher within an organization, Amsterdam, 50, 35, €3042} •j7 = {Market researcher within an organization, Rotterdam, 1500, 20, €2766} •j8 = {Market researcher within an organization, The Hague, 500, 25, €2489} •j9 = {Product Manager, Groningen, 500, 35, €2766} •j10 = {Product Manager, Amsterdam, 1500, 30, €2489} •j11 = {Product Manager, Rotterdam, 50, 25, €3180} •j12 = {Product Manager, The Hague, 150, 20, €3042} •j13 = {Management Consultant, Groningen, 1500, 25, €3042} •j14 = {Management Consultant, Amsterdam, 500, 20, €3180} •j15 = {Management Consultant, Rotterdam, 150, 35, €2489} •j16 = {Management Consultant, The Hague, 50, 30, €2766}
Table 4: Job definitions according to an orthogonal array
7.3.3 Seed node design
The number of seed nodes and their preferences were designed based on a latent class
analysis following a conjoint experiment. In the next sections, the steps to determine the
seed nodes are discussed.
Conjoint analysis
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
28
The independent variables of the model can be specified as linear, quadratic or part worth
functions. There are five attributes plus a no-choice attribute. The attributes location and
position are part-worth utilities since these can only be defined categorically. The no-
choice option is defined numerically for computational convenience. The remaining three
variables allow for eight different models (23). The eight possible permutations are shown
in table 5. Several different methods were used to assess model fit and predictive
strength. A comprehensive overview of the test results on all models is shown in table 6.
Posi
tion
Size
Loca
tion
Hol
iday
s
Inco
me
No
Cho
ice
df
LL(0
)
LL(β
)
Model 1 nom nom nom nom nom num 16 -2628,41 -2193,70
Model 2 nom nom nom num nom num 14 -2628,41 -2197,58
Model 3 nom num nom num nom num 12 -2628,41 -2197,64
Model 4 nom num nom num num num 10 -2628,41 -2198,82
Model 5 nom nom nom num num num 12 -2628,41 -2198,76
Model 6 nom nom nom nom num num 14 -2628,41 -2194,89
Model 7 nom num nom nom num num 12 -2628,41 -2194,92
Model 8 nom num nom nom nom num 14 -2628,41 -2193,74
Table 5: Eight models for conjoint estimation (nom=nominal, num=numeric)
Likelihood Ratio Test
The models are compared against a null-model using a χ2 test. The χ2 test statistic and
degrees of freedom are derived according to equation 8.
𝜒! = −2 (lnℒ(0)− lnℒ(𝛽∗)) (8)
Such that lnℒ(0) = 𝑛 ∙ 𝑐 ∙ ln ( !!) is the minimum likelihood and
lnℒ(𝛽∗) = ln (𝑝(𝑖!"|𝐽!")!! is the maximum likelihood.
Where:
𝑚 = number of alternative per choice set J
𝑛 = number of consumers
𝑐 = number of choice sets per consumers
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
29
𝑝 𝑖 𝐽 = the conditional probability of alternative 𝑖 given choice set 𝐽.
with 𝑑𝑓 = 𝑛𝑝𝑎𝑟(!"ℒ !∗ )
The aim is to reject H0, which states no differences exist between the null-model and the
specified model. Each model outperformed the null-model (See LL ratio test in table 6).
Mod
el
df
Psue
do-R
2
Psue
do-R
2 adj
Hitr
ate
MA
E
LL ra
tio
AIC
(LL)
AIC
3(LL
)
BIC
CA
IC
1 16 .1654 .1593 47.679% 4.92% p(869,4198) < .01 4,419.41 4,435.41 4468.41 4468.51
2 14 .1639 .1586 47.046% 5.13% p(861,6768) < .01 4,423.15 4,437.15 4466.03 4466.12
3 12 .1639 .1593 46.994% 5.05% p(861,5425) < .01 4,419.28 4,431.28 4456.03 4456.11
4 10 .1634 .1594 46.835% 5.13% p(859,1870) < .01 4,417.64 4,427.64 4448.27 4448.33
5 12 .1635 .1589 46.835% 5.13% p(859,3094) < .01 4,421.52 4,433.52 4458.27 4458.35
6 14 .1649 .1596 47.046% 4.74% p(867,0514) < .01 4,417.78 4,431.78 4460.65 4460.74
7 12 .1649 .1604 47.046% 4.76% p(866,9862) < .01 4,413.84 4,425.84 4450.59 4450.67
8 14 .1654 .1600 47.521% 4.92% p(869,3488) < .01 4,415.48 4,429.48 4458.36 4458.44
Table 6: Model fit tests (high-lighted are best values)
After each model was tested against the null-model, the nested models were pitted
against each other with additional χ2 tests. To do this, the χ2 test statistic and the degrees
of freedom were computed according to equation 9 and 10.
𝑑𝑓 = 𝑛𝑝𝑎𝑟!"#$%! − 𝑛𝑝𝑎𝑟!"#$%! (9)
𝜒! = −2 lnℒ 𝛽∗ !"#$%! − lnℒ 𝛽∗ !"#$%! (10)
When no significant differences between the two nested models exist, the model with the
least number of parameters is chosen favoring model parsimony over model complexity.
If significant differences do exist, this means that model 1 outperforms model 2 in which
case model 2 is deleted from subsequent testing rounds. This procedure continues until
one model remains. In the end, model 7 performed best, eliminating model 8 in the final
round. (𝜒! = −2 −2194.9215−−2193.739 = 2.3636, df = 12− 14 = 2, p > .05
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
30
thus the model with the least number of parameters is favored for parsimony, which is
model 6).
Pseudo-R2
The Pseudo-R2 provides insights in the predictive strength of the model. To calculate the
Pseudo-R2, the equation 11 is used:
Pseudo-R2 = 1− !"ℒ(!∗)!"ℒ(!)
(11)
As expected, the highest Pseudo-R2 is found in model one (Pseudo-R2 = 0,165). This is
due to the fact that Pseudo-R2 does not punish for the number of parameters (df = 16).
Since goodness of fit always increases with more parameters, the adjusted Pseudo-R2
using equation 12. The best model was model 7 (𝑅!"#! = 0.16).
𝑅!"#! = 1−
ln(ℒ 𝛽∗ − 𝑛𝑝𝑎𝑟!"(ℒ !
lnℒ(0) (12)
Hit rate
The hit rate calculates the percentage of observations that were predicted correctly (See
equation 13).
𝐻𝑖𝑡 𝑟𝑎𝑡𝑒 =
# 𝑜𝑏𝑠𝑣𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦𝑡𝑜𝑡𝑎𝑙 # 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 (13)
Model one outperforms the others (HR = 0,477) since the formula does not penalize for
number of parameters.
Mean absolute error
To find out the predictive strength of the model, a holdout validation was performed.
Descriptive analysis on the holdout sample revealed how often the different choices were
actually selected and were compared against how often the different choices were
predicted. The results for model 7 are provided in table 7 and equation 16, based on the
mean absolute error formula in equation 14 and 15.
𝑀𝑒𝑎𝑛 𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝐸𝑟𝑟𝑜𝑟 =
1𝑛 𝑒!
!
!!!
(14)
Where:
𝑒! = 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑠ℎ𝑎𝑟𝑒𝑠! − 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑠ℎ𝑎𝑟𝑒𝑠! (15)
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
31
Alternative 1 Alternative 2 Alternative 3 No choice
Observed shares 39,2% 29,1% 29,1% 2,5%
Predicted shares 32,23% 32,49% 35,28% 0%
Absolute error 6,97% 3,39% 6,18% 2,5%
Table 7: Absolute error for model 7 based on holdout sample
𝑀𝐴𝐸!"#$%! =
6.97+ 3.39+ 6.18+ 2.504 = 4.76% (16)
Model 7 (MAE = 4.76%) is the second best model in terms of mean absolute error, after
model 6 (MAE = 4.74%) as shown in table 6.
Information criteria
Lastly, the AIC, AIC3, BIC and CAIC information criteria were checked. AIC3 penalizes
more heavily for complex models compared to AIC (See equation 17 and 18). BIC and
CAIC penalize model complexity even more so (See equation 19 and 20).
𝐴𝐼𝐶 = −2 𝑙𝑛 ℒ 𝛽∗ + 2 ∙ 𝑛𝑝𝑎𝑟!" ℒ ! (17)
𝐴𝐼𝐶3 = −2 𝑙𝑛 ℒ 𝛽∗ + 3 ∙ 𝑛𝑝𝑎𝑟!" ℒ ! (18)
𝐵𝐼𝐶 = −2 𝑙𝑛 ℒ 𝛽∗ + 𝑙𝑛 𝑁 ∙ 𝑛𝑝𝑎𝑟!" ℒ ! (19)
𝐶𝐴𝐼𝐶 = −2 𝑙𝑛 ℒ 𝛽∗ + 𝑙𝑛 𝑁 + 1 ∙ 𝑛𝑝𝑎𝑟!" ℒ ! (20)
The most important model fit tests are pseudo-𝑅!"#! , log likelihood ratio test between
models and the information criteria since these favor simple models. In the end, model 7
outperformed each other model on these metrics. Model 7 had lowest scores on each
information criterion (AIC = 4413.84, AIC3 = 4425.84, BIC = 4450.59, CAIC = 4450.67)
as well as the highest pseudo-𝑅!"#! . The subsequent latent class segmentation is
performed on model 7.
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
32
7.3.4 Latent classes
In order to find the optimal number of segments, the finite mixture model was estimated
with two to nine classes. Several information criteria were assessed to select the
appropriate model. BIC and CAIC are favored because these heavily penalize for model
complexity. According to this, the seven-class solution is most attractive (See table 8).
# of
Cla
sses
df
Cla
ss.E
rror
npar
LL
AIC
(LL)
BIC
CA
IC
AIC
3(LL
)
6 81 0.0541 77 -1705.9121 3565.8241 3801.644 3878.644 3642.8241
7 68 0.0395 90 -1666.6074 3513.2148 3788.8484 3878.8484 3603.2148
8 55 0.0486 103 -1638.1301 3482.2602 3797.7075 3900.7075 3585.2602
9 42 0.0503 116 -1621.484 3474.9679 3830.229 3946.229 3590.9679
Table 8: Information criteria of latent classes (Best performers highlighted)
Table 8 reveals that there is no single best model based on the information criteria.
Models six to nine each score highest on either AIC, AIC3, CAIC or BIC. Given this
ambiguous result, further analysis was performed on models with 6 to 9 latent classes.
This analysis included the estimation of the models (See table 9 for nine-class solution),
the calculation of sum utilities and the binomial logit model to construct probabilities
according to equation 21. Refer to table 10 for the result of the calculation of sum utilities
and the binomial logit transformation for the nine-class solution.
𝑝!" =
!" !!!!!!!! !!!∙!"#$%& !!!!! !!!∙!"#$!" !!!!!!!! !!!∙!"#$%& !!!!! !!!∙!"#$ !!" !!!
→𝑝!" < 50% = 𝑛𝑜 𝑝!" ≥ 50% = 𝑦𝑒𝑠
(21)
where:
𝑝!"= probability of taking a job j for class i
𝛽!!= utility of position for class i
𝛽!!= utility of location for class i
𝛽!!= utility of income for class i
𝛽!!= utility of holidays for class i
𝛽!!= utility of company size for class i
𝛽!!= no choice
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
33
After running this for each solution, the nine-class solution proved to contain the
most distinct classes. The nine-class solution was favored over other solutions that scored
better on the information criteria because it discriminates better between the choice and
no-choice option. The optimal latent class solution contained many classes that would be
indistinguishable from one another after applying the logit transformation to a binary
choice variable (yes and no). Therefore, the seed nodes will be based on the slightly
suboptimal nine-class model.
Cla
ss 1
Cla
ss 2
Cla
ss 3
Cla
ss 4
Cla
ss 5
Cla
ss 6
Cla
ss 7
Cla
ss 8
Cla
ss 9
Position Market researcher in
a research company -0.375 1.4762 0.0307 -1.461 0.1061 -2.7862 -0.2363 1.3126 0.4304
Market researcher in an organization 0.1673 1.1358 -0.0652 -1.6496 -0.0055 -1.5598 -1.7017 4.8919 1.2905
Product manager 0.1785 -1.2679 -0.4174 1.3874 -0.5354 2.8474 4.9449 -0.8514 2.6248 Management Consultant 0.0292 -1.3441 0.4519 1.7233 0.4349 1.4986 -3.0069 -5.3531 -4.3457
Location Groningen -0.9003 0.6644 1.795 0.3825 -0.8701 -2.3234 1.2513 10.4116 -0.5032 Amsterdam 0.4157 -0.3532 0.0267 -0.6317 1.5527 1.2954 -1.4975 -3.0463 -1.0338 Rotterdam 0.2989 -0.1592 -1.4003 -0.3243 -0.3439 0.3445 -0.1548 -4.5445 -1.1551 The Hague 0.1857 -0.152 -0.4213 0.5735 -0.3387 0.6835 0.401 -2.8208 2.692 Gross Income 0.0029 0.0028 0.0011 0.0022 0.0031 0.0024 0.0035 0.0065 -0.0015 Company Size -0.0001 -0.0002 -0.0003 -0.0006 0.0003 -0.0008 -0.0012 0.0018 -0.0062 Holidays Per Year 20 days -0.549 -0.4815 -0.2979 -0.4431 -0.6131 -0.2922 -0.7685 -2.2661 -1.8076 25 days 0.0661 -0.1909 -0.0022 0.1002 0.1404 -0.506 -0.4078 -0.01 -1.3371 30 days 0.1934 0.1984 -0.0139 0.2118 0.0881 0.6421 1.045 3.3148 1.2409 35 days 0.2896 0.474 0.314 0.1311 0.3846 0.1561 0.1314 -1.0387 1.9038 No Choice -1.6995 4.6279 -6.1534 4.4169 8.8505 7.9173 7.167 30.1338 -5.5334
Table 9: Latent class utilities
Table 10 must be interpreted as follows: job preferences with a probability < 50%
are coded as ‘No’, whereas job preferences with a probability > 50% are coded as ‘Yes’.
Classes one, two and three do not contain any job preferences below 50% and are
therefore interpreted as identical. The duplicate classes one, two and three were merged
reducing the total number of distinct classes to seven. Significance testing and
interpretation of the different attributes is irrelevant in this case, given that the
experimental design for the collaborative filter algorithm must replicate the same
attributes and attribute levels as the conjoint experiment, regardless of whether of
significance levels. The final seven seed nodes in table 10 are marked in italics.
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
34
Job
Late
nt C
lass
1
Seed
nod
e 9
Late
nt C
lass
2
Seed
nod
e 8
Late
nt C
lass
3
Seed
nod
e 1
Late
nt C
lass
4
Seed
nod
e 2
Late
nt C
lass
5
Seed
nod
e 3
Late
nt C
lass
6
Seed
nod
e 4
Late
nt C
lass
7
Seed
nod
e 5
Late
nt C
lass
8
Seed
nod
e 6
Late
nt C
lass
9
Seed
nod
e 7
j1 99.92% 98.18% 100.00% 82.02% 15.26% 1.77% 94.36% 2.32% 13.51% j2 99.99% 98.23% 99.99% 39.79% 82.75% 3.24% 54.80% 0.00% 18.45% j3 100.00% 99.51% 99.97% 59.92% 64.11% 5.65% 97.16% 0.01% 16.61% j4 100.00% 99.69% 99.99% 71.56% 83.33% 3.13% 92.12% 0.01% 2.90% j5 100.00% 99.81% 100.00% 80.75% 56.59% 2.54% 98.76% 99.99% 86.55% j6 100.00% 99.42% 99.99% 52.38% 92.60% 31.76% 58.72% 0.01% 94.38% j7 99.99% 96.48% 99.91% 16.13% 31.25% 1.82% 12.89% 0.00% 0.01% j8 99.99% 95.41% 99.97% 44.60% 23.35% 2.35% 31.78% 0.00% 79.34% j9 99.99% 94.72% 100.00% 96.34% 24.11% 26.92% 99.97% 6.10% 90.97% j10 99.99% 64.99% 99.97% 75.52% 60.37% 83.77% 98.58% 0.00% 0.93% j11 100.00% 93.39% 99.96% 97.62% 57.04% 91.38% 99.97% 0.00% 64.24% j12 100.00% 87.63% 99.98% 97.60% 29.68% 92.43% 99.96% 0.00% 97.21% j13 99.99% 91.90% 100.00% 95.43% 49.54% 5.05% 30.07% 0.76% 0.00% j14 99.98% 51.56% 99.99% 80.34% 59.63% 53.94% 1.30% 0.00% 0.02% j15 99.99% 78.28% 99.97% 92.41% 35.11% 48.39% 15.90% 0.00% 6.14% j16 100.00% 85.93% 99.99% 98.44% 48.09% 81.84% 70.95% 0.00% 65.96%
Table 10: Probability of taking a job by latent class (<50% = no, >50% = yes)
7.4 Implementation
The collaborative filter was implemented using the recommendationRaccoon engine8
(Morita, 2014). The recommendationRaccoon engine uses the Jaccard similarity
coefficient, a commonly used distance measure for binary data, which is appropriate for a
binary like / dislike rating system. It uses a k-Nearest Neighbor (k-NN) algorithm to
compare participants solely to nearest neighbors in order to keep the calculations fast.
The code was altered to work with the job preference data model. Representative user
profiles derived from the a priori latent class analysis were uploaded in the database
according to the information of table 10 with the cells labeled in red marked as 0 (No)
8 Github repositories for recommendationRaccoon:
https://github.com/guymorita/Mosaic-Films---Recommendation-Engine-Demo
https://github.com/guymorita/recommendationRaccoon
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
35
and other cells marked as 1 (Yes). Refer appendix 3 to 6 for details on the
implementation.
7.5 Experimental setting
The experimental design uses two groups: an experimental group is provided job
recommendations via a collaborative filter based on a seeded database and a control
group is provided recommendations via an untrained collaborative filter. A short,
descriptive text outlining the purpose of the study is shown. A web interface was built
where participants are asked to rate job recommendations from the collaborative filter
(Refer to figure 1 and appendix 1). Participants each receive 16 recommendations. After
each answer, the database is updated and the collaborative filter provides a new, more
accurate recommendation.
Figure 1: Collaborative Filter
7.6 Predictive validity
The predictive validity of the collaborative filter experiment is determined using hit rates.
However, there is a crucial caveat that must be discussed when using hit rates for
recommender systems. The hit rate increases when observed and expected values
correspond. This implies that if a job is recommended and is selected by the participant (1
/ yes), the hit rate increases. The hit rate can also increase when the observed value is 0
and the predicted value is 0. This never occurs in collaborative filtering since, when no
good recommendations are left (jobs coded as 1 / yes), the recommendation engine will
simply exhaust all the jobs in the database, recommending even those jobs where it
predicts values of 0 / no. That means that, after all good options in the database are
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
36
depleted, the performance of the recommendation engine dips. This is only a problem for
databases with a very small, humanly exhaustible amount of jobs such as in this
controlled experiment, where a limited number of good job options are available. The hit
rate is calculated after each nth recommendation and each respondent to plot the evolution
of the predictive strength of the collaborative filter. Hit rate provides an indication of
predictive validity at the individual level (Melles, Laumann & Holling, 2000). After
computing the hit rate of the nth recommendation for all participants, the mean hit rate for
each nth recommendation is computed to get an overall performance statistic per nth
recommendation.
8 Results
Two groups of 50 respondents (Seed group vs. Benchmark group) were analyzed. The
student sample of the seed condition consists of 51% males (Seed: N = 27, Benchmark: N
= 24) and 49% females (Seed: N = 23, Benchmark: N = 26). Age was distributed evenly
(Seed: M = 23.84, SD = 2.159, Benchmark: M = 23.60, SD = 2.237). Each respondent
was approached at the Faculty of Economics and Business of the University of
Groningen. Each person was asked to participate in person.
The 1st job was not recommended by the collaborative filter, but was hardcoded
into the software. Ergo, every respondent started out evaluating the same job (See j1 in
table 4). The CF starts recommending jobs on the 2nd choice. In this experiment the CF
recommends jobs until all choices are exhausted. When there are no good
recommendations left (sum utility ≥ 50%), the CF continues to recommend jobs even
though the jobs may have low sum utilities (sum utility < 50%), until each job is
recommended. Thus, when all strong predictions are exhausted, jobs are recommended
for which the CF predicts that the no choice option is preferred over the choice option. In
other words, a perfect CF orders the jobs such that all the jobs for which the choice
option is preferred appear first, and the jobs for which the no choice options is preferred
appear last. Due to this noise, the hit rates on the second half of nth recommendations do
not reflect accurately the performance of the CF. Thus, a steep decline in hit rate is
expected to occur after all good recommendations are exhausted and the algorithm is
forced to recommend jobs with low sum utilities. This drop off point occurs after the 9th
choice (or 8th recommendation) as can be seen in figure 2. Therefore, the insights this
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
37
study provides are based on choice 2-9. Hit rates were calculated on an aggregate level
(mean hit rate of all respondents per choice) to track the evolution of the overall
performance of the CF over consecutive job choices (See figure 2 and table 12).
Figure 2: Hit rate evolution over consecutive job choices (seed vs. benchmark)
Figure 2 provides overall insights into the performance of the seed condition compared to
the benchmark condition. Taking into account the cut-off point after the 9th job choice,
the seed recommendation outperforms the benchmark recommendation at each nth choice.
In order to gain insights into the cold start problem, the hit rates for each choice were
examined for each individual respondent (See appendix 2). The seed condition
outperformed the benchmark condition as the database gets more populated, but there is
no conclusive evidence to indicate that it the seed condition differed from the benchmark
condition before the 15th respondent.
60%64%
62%60%
52% 48%54%
46%
44%
40%
34%
40%
62%
42%
38%
76%82%
64%62%
56%
66%62%
72%
38%
56%50%
38%
22% 20%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Hit
rate
in %
Consecutive job choices
Hit rate evolution
Mean hit rate benchmark condition for choices Mean hit rate seed condition for choices 2-16
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
38
Rec
omm
enda
tion
Benchmark Seed
Mea
n hi
t rat
e
Pred
ictio
n er
ror
Mea
n hi
t rat
e
Pred
ictio
n er
ror
1 60% 40% 76% 24% 2 64% 36% 82% 18% 3 62% 38% 64% 36% 4 60% 40% 62% 38% 5 52% 48% 56% 44% 6 48% 52% 66% 34% 7 54% 46% 62% 38% 8 46% 54% 72% 28% 9 44% 56% 38% 62% 10 40% 60% 56% 44% 11 34% 66% 50% 50% 12 40% 60% 38% 62% 13 60% 40% 22% 78% 14 42% 58% 20% 80% 15 38% 62% 38% 62%
Table 12: Hit rate and prediction error per choice
9 Conclusions and recommendations
9.1 Findings
This study showed how the integration of seed nodes derived from a conjoint experiment
can boost the performance of a collaborative filter. The main goals of this study were to
link conjoint analysis to collaborative filtering in a way that would address the cold start
problem and to evaluate whether or not the development of a fully automated conjoint
recommender is worthwhile. The cold start problem was partially addressed. On the
aggregate level, the seed condition predicted more accurately the preferences of new
respondents by at least 2% and up to 36% compared to the benchmark. This means that
seeding the database with latent classes resulted in performance gains, effective even for
databases with a low population (N < 50). This suggests that training the collaborative
filter using latent classes is a worthwhile optimization technique. As was mentioned in
the previous chapter, both conditions suffer from a decline in hit rates after the 9th
consecutive choice. The main explanation for the appearance of this pattern is because
the controlled experiment used only 16 jobs for reasons discussed earlier. The
experimental design was set up in such a way that each job was evaluated per participant.
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
39
Imagine a participant likes 9 out of 16 jobs in the database. In the most extreme example
the recommender system predicts the responses perfectly. The behavior of the
collaborative filter is such that it will try to recommend the 9 jobs that the participant
likes first. After the 9th choice, only disliked jobs remain. Given that the collaborative
filter depleted all good predictions, it will continue naively to recommend the disliked
jobs. In perfect conditions, the hit rate would dramatically drop from 100% to 0% after
the 9th choice. However, in real world situations this would hardly ever occur, since most
e-commerce sites carry a vast product catalog containing perhaps hundreds of thousands
of products that are virtually impossible to deplete by its users.
To drill down on the results further, the cumulative hit rates per respondent were
examined for each of the eight relevant choices (2-9). However, no clear pattern emerges
within the first 15 respondents. After the 15th respondent, the two conditions stabilize
with the seed condition slightly outperforming the benchmark condition. In this sense, the
cold start problem still remains. However, this study has showed that there is merit to the
linking of conjoint analysis with collaborative filtering. Overall performance can be
increased and the cold start problem is partially, though not entirely, addressed.
9.2 Limitations
Several limitations were present in the analysis. The optimal latent class solution
contained many classes that would be indistinguishable from one another after the logit
transformation to a binary choice variable (yes and no). A suboptimal model was selected
in the context of this study. Covariates were not taken into account when building the
latent classes. This would have resulted in slightly different utility levels. However, since
attribute significance was irrelevant in this study and because the cut off point for the
binary response variable after the logit transformation was < 50% probability, slight
changes in utility levels would have resulted in indistinguishable differences same as
described in the latent class section of the results chapter. Therefore, its adverse impact
is limited. Another limitation is the fact that the respondents consisted mainly of master
students who are about to enter the labor market. Therefore, they may consider jobs that
they would not normally consider as a way of reducing uncertainty. This may or may not
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
40
have resulted in higher hit rates. However this applies to both the seed and benchmark
condition. Thus, the interpretation of the results remains valid.
9.3 Further research
Although this study is a steppingstone toward building a true conjoint
recommender, the procedure outlined in this paper can be applied in practice today.
Marketing managers should consider integrating ad hoc conjoint experiments to optimize
their recommendation engines, distinguishing between users and seed nodes. This study
takes a progressive step toward building a hybrid top-N recommender based on a fully
automated conjoint procedure where each top-N recommendation list is computationally
treated as a conjoint choice set. Future research should modify the algorithm such that it
automatically derives utility levels and transforms this into binary predictions much like
the ad hoc conjoint procedure that was performed in this study. Researchers should also
develop a self-categorizing system that automatically derives and designs attribute levels,
preferably taking into account non-domination, orthogonality and balance. A modified
version of TF-IDF holds promise in this area. This is a prerequisite step toward creating a
fully automated conjoint recommender. Finally, a hypothetical automated conjoint
recommender should display choice sets similar to top-N recommenders where more than
one recommendation is provided. This approach will make the learning process more
efficient, since it compares not only the utilities with the no choice option, but also with
other utility values. Major online retailers like Amazon already use top-N
recommendations extensively (See figure 3), so it would not impact the customer
experience from a usability standpoint as the visual similarity between top-N
recommendation lists and conjoint experiments is evident (See figure 4).
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
41
Figure 3: Amazon.com top-N book recommendations
Book
None of these
Author Nick Bolstrom Robin Hanson William Hertling
Length 350-400 pages >= 500 pages 300-350 pages
Genre Artificial intelligence Artificial intelligence Artificial intelligence
Price $12.- $3.- $5.-
Type Soft Cover Ebook Ebook
Choice ● ○ ○ ○
Figure 4: Preference measurement for conjoint study
10 References
Adomavicius, G. & Tuzhilin, A. (2005). Towards the Next Generation of Recommender
Systems : A Survey of the State-of-the-Art and Possible Extensions. IEEE
Transactions on Knowledge and Data Engineering, 17(6), 1–43. doi:
10.1109/TKDE.2005.99.
Allenby, G. M. & Rossi, P. E. (2006). Hierarchical Bayes Models. The Handbook of
Marketing Research: Uses, Misuses, and Future Advances, 1–59.
doi:10.1111/j.1749-6632.2009.05310.x
Allison, O. (2012, 8 28). How Important Are Extracurricular Activities For Admissions
To Top-Tier Universities? Retrieved 3 4, 2015, from Forbes:
http://www.forbes.com/sites/quora/2012/08/28/for-top-students-at-elite-prep-and-
magnet-high-schools-how-important-are-extracurricular-activities-for-admissions-
to-top-tier-universities/
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
42
Al-Otaibi, S. (2012). A survey of job recommender systems. International Journal of the
Physical Sciences, 7(29), 5127–5142. doi:10.5897/IJPS12.482
Andrew I. Schein, Alexandrin Popescul & Lyle H. Ungar. (2002). Methods and Metrics
for ColdStart Recommendations. Proceedings of the 25th annual international ACM
SIGIR conference on Research and development in information retrieval SIGIR 02,
8. doi:10.1145/564376.564421
Beutel, A., Weimer, M. & Minka, T. (2014). Elastic Distributed Bayesian Collaborative
Filtering. Proceedings of the NIPS Distributed Machine Learning workshop.
Bobadilla, J., Ortega, F., Hernando, A. & Alcalá, J. (2011). Improving collaborative
filtering recommender system results and performance using genetic algorithms.
Knowledge-Based Systems, 24(8), 1310–1316. doi:10.1016/j.knosys.2011.06.005
Caplan, B. (2011, 11 18). How Elite Firms Hire: The Inside Story. Retrieved 3 4, 2015,
from Library of Economics and Liberty:
http://econlog.econlib.org/archives/2011/11/how_elite_firms.html
Chapman, C. N., Alford, J. L., Johnson, C., Weidemann, R. & Lahav, M. (2009). CBC
vs. ACBC: Comparing Results with Real Product Selection. Sawtooth Software
Research Paper, 98382(360).
Chirita, P.-A., Nejdl, W. & Zamfir, C. (2005). Preventing shilling attacks in online
recommender systems. Proceedings of the Seventh ACM International Workshop on
Web Information and Data Management WIDM 05, 55, 67.
doi:10.1145/1097047.1097061
Chopra, S. (2003). Designing the distribution network in a supply chain. Transportation
Research Part E: Logistics and Transportation Review, 39, 123–140.
doi:10.1016/S1366-5545(02)00044-3
Ding, M., Park, Y.-H. & Bradlow, E. T. (2009). Barter Markets for Conjoint Analysis.
Management Science, 55(6), 1003–1017. doi:10.1287/mnsc.1090.1003
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
43
Eggers, F. & Sattler, H. (2009). Hybrid individualized two-level choice-based conjoint
(HIT-CBC): A new method for measuring preference structures with many attribute
levels. International Journal of Research in Marketing, 26(2), 108–118.
doi:10.1016/j.ijresmar.2009.01.002
Eggers, F. & Sattler, H. (2011). Preference Measurement with Conjoint Analysis:
Overview of state of the art approaches and recent developments, Gfk Marketing
Intelligence Review, 3(1), 36–47.
Ekstrand, M. D. (2010). Collaborative Filtering Recommender Systems. Foundations and
Trends in Human Computer Interaction, 4(2), 81–173. doi:10.1561/1100000009
Elkin, N., Boyle, C., Dolliver, M., Fisher, L. T., Hudson, M., Hallerman, D., et al. (2014).
Key Digital Trends For 2015: What’s in Store—and Not in Store—for the Coming
Year. New York: eMarketer.
Goldberg, D., Nichols, D., Oki, B. M. & Terry, D. (1992). Using collaborative filtering to
weave an information Tapestry. Communications of the ACM - Special issue on
information filtering, 35(12), 61–70.
Green, P. E., Krieger, A. M. & Bansal, P. (1988). Completely Unacceptable Levels in
Conjoint Analysis: A Cautionary Note. Journal of Marketing Research, 25(August),
293–300. doi:10.2307/3172532
Green, P. E. & Srinivasan, V. (1989). Conjoint Analysis in Marketing: New
Developments With Implications for Research and Practice. Journal of Marketing,
54(4), 3–19.
Hill, W., Stead, L., Rosenstein, M. & Furnas, G. (1995). Recommending and evaluating
choices in a virtual community of use. Proceedings of the SIGCHI Conference on
Human Factors in Computing Systems, 194–201. doi:10.1145/223904.223929
Jannach, D. & Friedrich, G. (2011). Tutorial: Recommender Systems. International Joint
Conference on AI.
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
44
Kahneman, D., Knetsch, J. L., & Thaler, R. H. (1990). Experimental tests of the
endowment effect and the Coase theorem. Journal of political Economy, 1325-1348.
Kahneman, D. & Tversky, A. (1984). Choices, values, and frames. American
Psychologist, 39, 341–350. doi:10.1037/0003-066X.39.4.341
Kaminskas, M. & Bridge, D. (2014). Measuring Surprise in Recommender Systems, In
Workshop on Recommender Systems Evaluation: Dimensions and Design.
Kim, B. M. (2003). Clustering approach for hybrid recommender system. Proceedings
IEEE/WIC International Conference on Web Intelligence (WI 2003), 33–38.
doi:10.1109/WI.2003.1241167
Kolodner, J. (1983). Reconstructive Memory: A Computer Model. Cognitive Science,
281328.
Konstan, J. A. & Recommender, E. (2004). Introduction to recommender systems. ACM
Transactions on Information Systems, 22, 1–4. doi:10.1145/963770.963771
Kramer, T. (2007). The Effect of Measurement Task Transparency on Preference
Construction and Evaluations of Personalized Recommendations. Journal of
Marketing Research, 44, 224–233. doi:10.1509/jmkr.44.2.224
Kübler, S. (2005). Memory-Based Parsing. Computational Linguistics, 31, 419–422.
doi:10.1162/089120105774321082
Kuhfeld, W. F. (2010). Conjoint Analysis. SAS Technical Papers, MR2010H, 681–801.
Lamere P., Green S. (2008). Project Aura - Recommendation for the rest of us. JavaOne.
Linden, G., Smith, B. & York, J. (2003). Amazon.com recommendations: Item-to-item
collaborative filtering. IEEE Internet Computing, 7(February), 76–80.
doi:10.1109/MIC.2003.1167344
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
45
Liu, N. (2011). Wisdom of the Better Few : Cold Start Recommendation via
Representative based Rating Elicitation. Proceedings of the 5th ACM Conference on
Recommender Systems - RecSys ’11, 37–44. doi:10.1145/2043932.2043943
Louviere, J. J. & Woodworth, G. (1983). Choice Allocation Consumer Experiments: An
Approach Aggregate Data. Journal of Marketing, 20(4), 350–367.
Maier, C., Laumer, S., Eckhardt, A. & Weitzel, T. (2012). When Social Networking
Turns to Social Overload: Explaining the Stress, Emotional Exhaustion, and
Quitting Behavior from Social Network Sites’ Users. European Conference on
Information Systems (ECIS) Proceedings, 1–12.
Mannan, N. Bin, Sarwar, S. M. & Elahi, N. (2014). for Collaborative Filtering Using
Artificial Neural Network. In Engineering Applications of Neural Networks, 145–
154.
Melles, T., Laumann, R. & Holling, H. (2000). Validity and Reliability of Online
Conjoint Analysis. Proceedings of the Sawtooth Software Conference, (March), 31–
40.
Moore, J. (2011, 3 22). Building a recommendation engine, foursquare style. Retrieved 3
2, 2015, from Foursquare: http://engineering.foursquare.com/2011/03/22/building-a-
recommendation-engine-foursquare-style/
Morita, M. & Shinoda, Y. (1994). Information filtering based on user behavior analysis
and best match text retrieval. Proceedings of the 17th Annual International ACM
SIGIR Conference on Research and Development in Information Retrieval, 272–
281. doi:http://dl.acm.org/citation.cfm?id=188490.188583
Nadi, S., Saraee, M. H., Jazi, M. D. & Bagheri, A. (2011). FARS: Fuzzy Ant based
Recommender System for Web Users. International Journal of Computer Science
Issues, 8(1), 203–209.
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
46
Netzer, O., Toubia, O., Bradlow, E. T., Dahan, E., Evgeniou, T., Feinberg, F. M., & Rao,
V. R. (2008). Beyond Conjoint Analysis: Advance in Preference Measurement.
Marketing Letters, 19(3), 337–354.
Netzer, O. & Srinivasan, V. (2011). Adaptive self-explication of multi-attribute
preferences. Journal of Marketing Research, 48(1), 140–156.
Oakes, C. (1999, 8 12). Firefly's Dim Light Snuffed Out. Retrieved 2 23, 2015, from
Wired.com: http://archive.wired.com/culture/lifestyle/news/1999/08/21243
Pronk, V., Verhaegh, W., Proidl, A., & Tiemann, M. (2007). Incorporating user control
into recommender systems based on naive bayesian classification. Proceedings of
the ACM Conference On Recommender Systems, 73–80.
doi:10.1145/1297231.1297244
Resnick, P., Iacovou, N. & Suchak, M. (1994). GroupLens: an open architecture for
collaborative filtering of netnews. Proceedings of the CSCW '94 Proceedings of the
1994 ACM conference on Computer supported cooperative work, 175–186.
doi:10.1145/192844.192905
Rogers, B. P., Puryear, R. & Root, J. (n.d.). Infobesity : The enemy of good decisions.
Retrieved 6-6, 2015, from Bain Review:
http://www.bain.com/publications/articles/infobesity-the-enemy-of-good-
decisions.aspx
Salihefendic, A. (2010, 10 11). How Hacker News ranking algorithm works. Retrieved 3
2, 2015, from amix.dk: http://amix.dk/blog/post/19574
Salihefendic, A. (2010, 11 23). How Reddit ranking algorithms work. Retrieved 3 2,
2015, from amix.dk: http://amix.dk/blog/post/19588
Scholz, S. W., Meissner, M. & Decker, R. (2010). Measuring Consumer Preferences for
Complex Products: A Compositional Approach Based on Paired Comparisons.
Journal of Marketing Research, 47(August), 685–698. doi:10.1509/jmkr.47.4.685
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
47
Senior, D. (2015, 2 7). Search Stops Here, Starts Afresh On Mobile . Retrieved 2 20,
2015, from Techcrunch: http://techcrunch.com/2015/02/07/search-stops-here-starts-
afresh-on-mobile/
Shank, R. (1999). Dynamic memory: a theory of reminding and learning in computers
and people (2nd ed.), 1–13. Cambridge University Press.
Shardanand, U. (1994). Social information filtering for music recommendation.
Proceedings of the SIGCHI conference on Human factors in computing systems.
210–217.
Tan, P., Steinbach, M. & Kumar, V. (2007). Examples of proximity measures. In
Introduction to Data Mining, 73–74, Boston, MA: Pearson Addison Wesley.
Terveen, L., Hill, W., Amento, B., McDonald, D. & Creter, J. (1997). PHOAKS: a
system for sharing recommendations. Communications of the ACM, 40(March), 59–
62. doi:10.1145/245108.245122
Thompson, C. (2008, 11 21). If You Liked This, You’re Sure to Love That. Retrieved 3 3,
2015, from NY Times: http://www.nytimes.com/2008/11/23/magazine/23Netflix-
t.html?pagewanted=all&_r=0
Tintarev, N. & Masthoff, J. (2011). Recommender Systems Handbook. Recommender
Systems Handbook (Vol. 54, pp. 479–510). doi:10.1007/978-0-387-85820-3
Wen, Z. (2008). Recommendation System Based on Collaborative Filtering, Unpublished
manuscript, Department of Compter Science, Stanford University, Stanford, CA.
Wu, L. & Choi, S. (2014). The Browsemaps: Collaborative Filtering at LinkedIn.
Proceedings of the 6th Workshop on Recommender Systems and the Social Web
(RSWeb 2014), Foster City, CA: CEUR-WS.
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
48
Xiao, B. & Benbasat, I. (2007). E-Commerce Product Recommendation Agents: Use,
Characteristics, and Impact 1. Commerce Product Recommendation Agents MIS
Quarterly, 31(1), 137–209.
Zhou, K., Yang, S. & Zha, H. (2011). Functional Matrix Factorizations for Cold-Start
Recommendation. Proceedings of the 34th International ACM SIGIR Conference on
Research and Development in Information Retrieval, 315–324.
doi:10.1145/2009916.2009961
Zhou, Y., Wilkinson, D., Schreiber, R. & Pan, R. (2008). Large-scale parallel
collaborative filtering for the netflix prize. Lecture Notes in Computer Science
(including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics), 5034 LNCS, 337–348. doi:10.1007/978-3-540-68880-8_32
11 Appendices Appendices 2 to 5 consist of the source code for the implementation of the collaborative
filter, the survey and job importer. To replicate one must install the following Javascript
libraries from the terminal using brew and npm: $ brew install nodejs $ brew install npm $ brew install redis $ brew intsall mongodb $ brew install git $ git clone https://github.com/guymorita/Mosaic-Films---Recommendation-Engine-Demo.git thesis $ cd thesis $ npm install $ npm install body-parser $ npm install express $ npm install mongo $ npm install mongoose $ npm install raccoon@0.2.3 $ npm install redis $ redis-server $ Edit and replace node_modules/raccoon/lib/config.js: exports.config = function(){ return { nearestNeighbors: 5, className: 'movie', numOfRecsStore: 30,
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
49
sampleContent: true, factorLeastSimilarLeastLiked: false, localMongoDbURL: 'mongodb://localhost/users', remoteMongoDbURL: process.env.MONGO_HOSTAUTH, localRedisPort: 6379, localRedisURL: '127.0.0.1', remoteRedisPort: 12000, remoteRedisURL: process.env.REDIS_HOST, remoteRedisAuth: process.env.REDIS_AUTH, flushDBsOnStart: true, localSetup: true }; }; $ node surveyserver.js $ Browse to: http://localhost:3000
11.1.1 Appendix 1: Custom Survey
Demographic information asked prior to CF experiment.
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
50
11.1.2 Appendix 2: Individual level cumulative hit rates per nth recommendation
Cumulative hit rates per respondent (red = seed, blue = benchmark)
0%10%20%30%40%50%60%70%80%90%
100%
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49
Hit
rate
in %
Respondents in chronological order
1st choice
0%10%20%30%40%50%60%70%80%90%
100%
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49
Hitr
ate
in %
Respondents in chronological order
2nd choice
0%10%20%30%40%50%60%70%80%90%
100%
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49
Hitr
ate
in %
Respondents in chronological order
3rd choice
0%10%20%30%40%50%60%70%80%90%
100%
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49
Hitr
ate
in %
Respondents in chronological order
4th choice
0%10%20%30%40%50%60%70%80%90%
100%
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49
Hitr
ate
in %
Respondents in chronological order
5th choice
0%10%20%30%40%50%60%70%80%90%
100%
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49
Hitr
ate
in %
Respondents in chronological order
6th choice
0%10%20%30%40%50%60%70%80%90%
100%
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49
Hitr
ate
in %
Respondents in chronological order
7th choice
0%10%20%30%40%50%60%70%80%90%
100%
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49
Hitr
ate
in %
Respondents in chronological order
8th choice
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
51
11.1.3 Appendix 3: surveyserver.js
var express = require('express'); var app = express();
var path = require('path'); var bodyParser = require('body-parser');
var mongoose = require('mongoose'); var raccoon = require('raccoon');
app.use(express.static(path.join(__dirname, 'public')));
app.use(bodyParser.urlencoded({extended: false}));
var Schema = mongoose.Schema; // connect to mongodb
mongoose.connect('mongodb://127.0.0.1/thesis'); raccoon.connect(6379, '127.0.0.1');
// raccoon.flush();
// User schema var userSchema = new mongoose.Schema({
age: Number, gender: Boolean,
workexperience: Boolean, highesteducation: Boolean,
majorminor:String, citieslived: Array,
citiesfamily: Array, citiesfriends: Array,
uid: String });
// Item schema
var itemSchema = new mongoose.Schema({ // collection
_id: Number, jobtitle: String, place: String,
companysize: Number, holidays: Number,
salary: Number });
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
52
// Rating schema
var ratingSchema = new mongoose.Schema({ uid: String,
action: String, date: Date
});
var User = mongoose.model('User', userSchema); var Item = mongoose.model('Item', itemSchema);
var Rating = mongoose.model('Rating', ratingSchema);
// make this available to our users in our Node applications module.exports = User;
var insertRating = function(uid, action) {
var data = { uid: uid,
action: action, date: new Date()
};
var rating = new Rating(data); rating.save(function(err) {
if (err) { console.log('Rating created');
throw err; }
}); }
// save the user to the database (I think?)
var insertUser = function(age, gender, workexperience, highesteducation, majorminor, citieslived, citiesfamily, citiesfriends, uid){
var userData = { age: age,
gender: gender, workexperience: workexperience,
highesteducation: highesteducation, majorminor: majorminor,
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
53
citieslived: citieslived, citiesfamily: citiesfamily,
citiesfriends: citiesfriends, uid: uid
};
console.log(userData);
var user = new User(userData);
user.save(function(err) { if (err) {
console.log('User created'); throw err;
} });
};
/////////////////////////////// // Fill recommend dummy data //
///////////////////////////////
raccoon.liked('seed_user', 'item1', function(){}); raccoon.liked('seed_user', 'item2', function(){});
//////////////////////////// // Generate unique userid //
////////////////////////////
var generateUid = function () { // Math.random should be unique because of its seeding algorithm.
// Convert it to base 36 (numbers + letters), and grab the first 9 characters after the decimal.
return '_' + Math.random().toString(36).substr(2, 9); };
//////////////////////////
// Start node.js server // //////////////////////////
// GET methode route
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
54
app.get('/retrieveitem', function(req, res){ console.log('Whats in the box?' + req.query.itemid); //bij get request
req.query
Item.findOne({ 'itemid': req.query.itemid }, null, function (err, item) { res.send(item.toObject());
}) });
// POST methode route
app.post("/save_demographics", function (req, res) { console.log(req.body);
var userUid = generateUid();
insertUser(req.body.age, req.body.gender, req.body.workexperience,
req.body.highesteducation, req.body.majorminor, req.body['citieslived[]'] ? req.body['citieslived[]'] : [req.body.citieslived], req.body['citiesfamily[]']
? req.body['citiesfamily[]'] : [req.body.citiesfamily], req.body['citiesfriends[]'] ? req.body['citiesfriends[]'] :
[req.body.citiesfriends], userUid);
res.send(userUid); // is data });
app.post("/like", function (req, res) { insertRating(req.body.userid, 'like');
raccoon.liked(req.body.userid, req.body.itemid, function(){
raccoon.recommendFor(req.body.userid, 1, function(results){ console.log('Recommended results: ' + results);
res.send(results[0]); });
}); });
app.post("/dislike", function (req, res) {
insertRating(req.body.userid, 'dislike');
raccoon.disliked(req.body.userid, req.body.itemid, function(){ raccoon.recommendFor(req.body.userid, 1, function(results){
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
55
console.log('Recommended results: ' + results); res.send(results[0]);
}); });
});
var server = app.listen(3000, function() { });
11.1.4 Appendix 4: jobrecsys.html
<!DOCTYPE html> <html>
<head> <title>Thesis: Job RecSys</title> <link rel="stylesheet" type="text/css" href="css/stylesheet.css"/>
<link href="bootstrap/css/bootstrap.min.css" rel="stylesheet"> </head>
<body> <!-- jQuery (necessary for Bootstrap's JavaScript plugins) -->
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js"></scrip
t> <!-- Include all compiled plugins (below), or include individual
files as needed --> <script src="bootstrap/js/bootstrap.min.js"></script>
<img src="img/logo.jpg" class="pull-right" height="100px"
width="100px"/>
<div class="container" align="center"> <div class="bg">
<br>Would you take this job, if it were offered to you?
<div><H5><div class="title" id="jobtitle">default</div></H5>
<table border="0px" width="240px"> <tr>
<td>location:</td> <td><div class="feature"
id="place">default</div></td>
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
56
</tr> <tr>
<td>company size:</td> <td><div class="feature"
id="companysize">default</div></td> </tr>
<tr> <td>holidays per year:</td>
<td><div class="feature" id="holidays">default</div></td>
</tr> <tr>
<td>gross salary:</td> <td><div class="feature"
id="salary">€ default</div></td> </tr>
</table> </div>
<div class=positionlikes>
<button class="like">yes</button> <button
class="dislike">no</button> </div> </div>
</div> </div>
</div>
<script>
var globalUserid = window.location.href.split("userid=")[1]; var globalItemid = "item1";
var numberItems = 16;
var allItems = [];
for (i = 1; i < numberItems+1; i++) { allItems.push(i);
}
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
57
function removeItemId(itemid) { var itemnum = parseInt(itemid.split("item")[1]);
for (i = 0; i < numberItems; i++) {
if (allItems[i] == itemnum) { allItems.splice(i, 1);
} }
}
function loadItem(itemid) { $.get("/retrieveitem?itemid=" + itemid , (function(item) {
console.log(item); console.log(item.itemid);
console.log(item.jobtitle); console.log(item.place);
console.log(item.companysize); console.log(item.holidays);
console.log(item.salary);
$(".title").html(item.jobtitle); $("#place").html(item.place);
$("#companysize").html(item.companysize); $("#holidays").html(item.holidays); $("#salary").html(item.salary);
})); }
loadItem(globalItemid);
console.log("SETTING INITIAL ITEM");
$('.like,.dislike').click(function(){ removeItemId(globalItemid);
var action = 'like';
if ($(this).hasClass('dislike')) {
action = 'dislike'; }
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
58
$.post(action, {userid: globalUserid, itemid: globalItemid}).done(function(data) {
console.log("Next item recommended: " + data);
if (allItems.length) { if (data) {
globalItemid = data; } else {
globalItemid = 'item' + allItems[0]; }
loadItem(globalItemid);
} else { alert('BEdankt voor de bloemen');
} });
}); </script>
</body> </html>
11.1.5 Appendix 5: thesis.html
<!DOCTYPE html>
<html> <head>
<title>Thesis: Job Recommendation</title> <link rel="stylesheet" type="text/css" href="css/stylesheet.css"/>
<!-- Bootstrap --> <link href="bootstrap/css/bootstrap.min.css" rel="stylesheet">
<!-- <script type="text/javascript" src="js/script.js"></script>--> </head>
<body> <!-- jQuery (necessary for Bootstrap's JavaScript plugins) -->
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js"></scrip
t> <!-- Include all compiled plugins (below), or include individual files as needed -->
<script src="bootstrap/js/bootstrap.min.js"></script>
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
59
<img src="img/logo.jpg" class="pull-right" height="100px" width="100px"/>
<div class="container">
<!-- FORM--> <form action="/save_demographics" method="post"
autocomplete="off">
<fieldset> <legend>Demographic information</legend>
<p> Please provide the following information:
</p> <!-- • Age in years-->
Age in years:<br> <input type="text" name="age" placeholder="e.g. 24"
required> <br>
<!-- • Gender (0 = female, 1 = male)--> Gender: <br>
<input type="radio" name="gender" value="male"> Male <br>
<input type="radio" name="gender" value="female"> Female <br>
<br> <!-- • Previous work experience (0 = no, 1 = yes)-->
Previous work experience: <br> <input type="radio" name="workexperience" value="no">
No <br>
<input type="radio" name="workexperience" value="yes"> Yes
<br> <br>
<!-- Highest education: B.Sc., M.Sc., PhD--> Please indicate your highest level of education: <br>
<input type="radio" name="highesteducation" value="bsc"> B.Sc.
<br>
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
60
<input type="radio" name="highesteducation" value="msc"> M.Sc.
<br> <input type="radio" name="highesteducation"
value="phd"> PhD <br>
<br> <!-- Major / Minor-->
Please indicate your major or minor:<br> <input type="text" name="majorminor"
placeholder="e.g. Marketing" required> <br>
</fieldset> <br>
<!-- • Cities in which the respondent has lived, including a 25 km radius
(dummy variable for each of the cities of the experimental design
with 0 = no, 1 = yes)--> <fieldset>
<legend>Exposure to Dutch cities</legend> Check all cities in which you have <b>lived</b>,
including a 25km radius: <br> <input type="checkbox" name="citieslived" value="Groningen"> Groningen<br>
<input type="checkbox" name="citieslived" value="Amsterdam"> Amsterdam<br>
<input type="checkbox" name="citieslived" value="Rotterdam"> Rotterdam<br>
<input type="checkbox" name="citieslived" value="The Hague"> The Hague<br>
<br>
<!-- • Cities in which the respondent has family (same structure as above)--> Check all cities in which you have <b>family</b>,
including a 25km radius: <br> <input type="checkbox" name="citiesfamily"
value="Groningen"> Groningen<br> <input type="checkbox" name="citiesfamily"
value="Amsterdam"> Amsterdam<br>
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
61
<input type="checkbox" name="citiesfamily" value="Rotterdam"> Rotterdam<br>
<input type="checkbox" name="citiesfamily" value="The Hague"> The Hague<br>
<br> <!-- • Cities in which the respondent has friends (same structure as above)-->
Check all cities in which you have <b>friends</b>, including a 25km radius: <br>
<input type="checkbox" name="citiesfriends" value="Groningen"> Groningen<br>
<input type="checkbox" name="citiesfriends" value="Amsterdam"> Amsterdam<br>
<input type="checkbox" name="citiesfriends" value="Rotterdam"> Rotterdam<br>
<input type="checkbox" name="citiesfriends" value="The Hague"> The Hague<br>
<br> </fieldset>
<br> <!-- Submit-->
<input type="submit" value="Continue"> </form>
</div> </div>
<script> // Serialize data -> convert to JSON
$.fn.serializeObject = function() {
var o = {}; var a = this.serializeArray();
$.each(a, function() { if (o[this.name] !== undefined) {
if (!o[this.name].push) { o[this.name] = [o[this.name]];
} o[this.name].push(this.value || '');
} else { o[this.name] = this.value || '';
} });
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
62
return o; };
$(function() {
$('form').submit(function() { // JQuery -> Output form data as JSON object
// var result = JSON.stringify($('form').serializeObject());
var result = $('form').serializeObject(); console.log(result);
// JQuery -> Send to Node
// $.post("save_demographics?userid=" + uidpackage, result).done(function(data) {
$.post("save_demographics", result).done(function(data) {
window.location="jobrecsys.html?userid=" + data; //stuur res.send mee
}); return false;
}); });
</script> </body> </html>
11.1.6 Appendix 6: jobs.js
var express = require('express'); var app = express();
var path = require('path'); var bodyParser = require('body-parser');
var mongoose = require('mongoose'); var raccoon = require('raccoon');
var Schema = mongoose.Schema;
// connect to mongodb mongoose.connect('mongodb://127.0.0.1/thesis');
// create a schema var itemSchema = new mongoose.Schema({
// collection
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
63
itemid: String, jobtitle: String,
place: String, companysize: Number,
holidays: Number, salary: Number
});
var Item = mongoose.model('Item', itemSchema);
// make this available to the item in our Node applications
module.exports = Item;
// save the job to the database // create document var insertItem = function(itemid, jobtitle, place, companysize, holidays,
salary){ var itemData = {
itemid: itemid, jobtitle: jobtitle,
place: place, companysize: companysize,
holidays: holidays, salary: salary };
console.log(jobData);
var item = new Item(jobData);
item.save(function(err) {
if (err) { console.log('Job created');
throw err; }
}); };
// SEED JOB DATA FROM ORTHOGONAL ARRAY
Item.create({ itemid: "item1",
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
64
jobtitle: 'Market researcher in a research company', place: 'Groningen',
companysize: 50, holidays: 20,
salary: 2489 }, function ( err, item) {
if(!err) { console.log('Job 1 saved!');
console.log('_id of saved job: ' + item._id); }
}); Item.create({
itemid: "item2", jobtitle: 'Market researcher in a research company',
place: 'Amsterdam', companysize: 150,
holidays: 25, salary: 2766
}, function ( err, item) { if(!err) {
console.log('Job 2 saved!'); console.log('_id of saved job: ' + item._id);
} }); Item.create({
itemid: "item3", jobtitle: 'Market researcher in a research company',
place: 'Rotterdam', companysize: 500,
holidays: 30, salary: 3042
}, function ( err, item) { if(!err) {
console.log('Job 3 saved!'); console.log('_id of saved job: ' + item._id);
} });
Item.create({ itemid: "item4",
jobtitle: 'Market researcher in a research company', place: 'The Hague',
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
65
companysize: 1500, holidays: 35,
salary: 3180 }, function ( err, item) {
if(!err) { console.log('Job 4 saved!');
console.log('_id of saved job: ' + item._id); }
}); Item.create({
itemid: "item5", jobtitle: 'Market researcher within an organization',
place: 'Groningen', companysize: 150,
holidays: 30, salary: 3180
}, function ( err, item) { if(!err) {
console.log('Job 5 saved!'); console.log('_id of saved job: ' + item._id);
} });
Item.create({ itemid: "item6", jobtitle: 'Market researcher within an organization',
place: 'Amsterdam', companysize: 50,
holidays: 35, salary: 3042
}, function ( err, item) { if(!err) {
console.log('Job 6 saved!'); console.log('_id of saved job: ' + item._id);
} });
Item.create({ itemid: "item7",
jobtitle: 'Market researcher within an organization', place: 'Rotterdam',
companysize: 1500, holidays: 20,
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
66
salary: 2766 }, function ( err, item) {
if(!err) { console.log('Job 7 saved!');
console.log('_id of saved job: ' + item._id); }
}); Item.create({
itemid: "item8", jobtitle: 'Market researcher within an organization',
place: 'The Hague', companysize: 500,
holidays: 25, salary: 2489
}, function ( err, item) { if(!err) {
console.log('Job 8 saved!'); console.log('_id of saved job: ' + item._id);
} });
Item.create({ itemid: "item9",
jobtitle: 'Product Manager', place: 'Groningen', companysize: 500,
holidays: 35, salary: 2766
}, function ( err, item) { if(!err) {
console.log('Job 9 saved!'); console.log('_id of saved job: ' + item._id);
} });
Item.create({ itemid: "item10",
jobtitle: 'Product Manager', place: 'Amsterdam',
companysize: 1500, holidays: 30,
salary: 2489 }, function ( err, item) {
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
67
if(!err) { console.log('Job 10 saved!');
console.log('_id of saved job: ' + item._id); }
}); Item.create({
itemid: "item11", jobtitle: 'Product Manager',
place: 'Rotterdam', companysize: 50,
holidays: 25, salary: 3180
}, function ( err, item) { if(!err) {
console.log('Job 11 saved!'); console.log('_id of saved job: ' + item._id);
} });
Item.create({ itemid: "item12",
jobtitle: 'Product Manager', place: 'The Hague',
companysize: 150, holidays: 20, salary: 3042
}, function ( err, item) { if(!err) {
console.log('Job 12 saved!'); console.log('_id of saved job: ' + item._id);
} });
Item.create({ itemid: "item13",
jobtitle: 'Management Consultant', place: 'Groningen',
companysize: 1500, holidays: 25,
salary: 3042 }, function ( err, item) {
if(!err) { console.log('Job 13 saved!');
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
68
console.log('_id of saved job: ' + item._id); }
}); Item.create({
itemid: "item14", jobtitle: 'Management Consultant',
place: 'Amsterdam', companysize: 500,
holidays: 20, salary: 3180
}, function ( err, item) { if(!err) {
console.log('Job 14 saved!'); console.log('_id of saved job: ' + item._id);
} });
Item.create({ itemid: "item15",
jobtitle: 'Management Consultant', place: 'Rotterdam',
companysize: 150, holidays: 35,
salary: 2489 }, function ( err, item) { if(!err) {
console.log('Job 15 saved!'); console.log('_id of saved job: ' + item._id);
} });
Item.create({ itemid: "item16",
jobtitle: 'Management Consultant', place: 'The Hague',
companysize: 50, holidays: 30,
salary: 2766 }, function ( err, item) {
if(!err) { console.log('Job 16 saved!');
console.log('_id of saved job: ' + item._id); }
SEED BY EXAMPLE: A CONJOINT SOLUTION TO THE COLD-START PROBLEM IN RECOMMENDER SYSTEMS
69
});
mongoose.connection.close();
top related