the effects of implementing domain knowledge in a ...1239416/fulltext01.pdfthe effects of...

F18021

Examensarbete 30 hpJuni 2018

The effects of implementing domain knowledge in a recommender system

Kerstin Ersson

Teknisk- naturvetenskaplig fakultet UTH-enheten Besöksadress: Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0 Postadress: Box 536 751 21 Uppsala Telefon: 018 – 471 30 03 Telefax: 018 – 471 30 00 Hemsida: http://www.teknat.uu.se/student

Abstract

The effects of implementing domain knowledge in arecommender system

Kerstin Ersson

This thesis presents a domain knowledge based similarity measure for recommender systems, using Systembolaget's open API with product information as input data. The project includes the development of the similarity measure, implementing it in a content based recommender engine as well as evaluating the model and comparing it to an existing model which uses a bag-of-words based approach. The developed similarity measure uses domain knowledge to calculate the similarity of three feature, grapes, wine regions and production year, to attempt to improve the quality of recommendations. The result shows that the bag-of-words based model performs slightly better than the domain knowledge based model, in terms of coverage, diversity and correctness. However, the results are not conclusive enough to discourage from more investigation into using domain knowledge in recommender systems.

ISSN: 1401-5757, UPTEC F18021Examinator: Tomas NybergÄmnesgranskare: Dave ZachariahHandledare: Siri Persson

Populärvetenskaplig sammanfattning

Dagens Internet är fullt av data. Varje sekund postas ca 8 000 tweets, 3 000samtal på Skype görs och 67 000 Google genomförs, enligt sajten InternetLive Stats.[1]

För att navigera dessa datamängder som möter oss när vi söker informationpå Google, tittar på serier på Netflix eller shoppar på Amazon behöver vihjälp. För att hjälpa användarna använder många av dessa sajter så kalladerekommendationssystem (recommender systems), som hjälper användaren atthitta relevanta träffar.[2]

Rekommendationssystemen baseras ofta på två dataset, ett med informationom användare och ett med artikelinfo. Systemen baserar rekommendationernapå fakta om användarens preferenser och/eller information om artiklarna.System som jämför användare för att hitta artiklar att rekommendera kallasför collaborative filtering systems och system som jämför data om artiklarkallas content based.[3][4] Rekommendationssystemet som används i dettaprojekt är content based.

För att jämföra artiklarna som ska rekommenderas, behövs en metod för attberäkna likheten (similarity measure). Detta mått behöver vara anpassatefter vilken typ av data som artikelbeskrivningarna innehåller.[5] Mycket avden data som idag finns på internet är i form av fritextdata, för vilket ettvanligt sätt att mäta likhet är den så kallade bag-of-words-metoden. Deninnebär att texter representeras som vektorer, där varje elementen betecknarde ord texten innehåller. Därefter beräknas avståndet mellan vektorerna medt.ex. cosinuslikhet (cosine similarity).[6]

Detta projekt undersöker huruvida ett rekommendationssystems prestandaförbättras om likhetsberäkningen tar hänsyn till ämneskunskap. Istället för attendast jämföra om två artiklar har samma egenskaper, tar systemet i beaktningatt en egenskap kan vara mer eller mindre lik olika andra egenskaper. Datansom används för att testa detta kommer från Systembolagets produktdatabasför viner. Två modeller har jämförts: en som använder bag-of-words-modellenoch en som använder domänkunskap.

Resultatet visar att bag-of-words-modellen presterar något bättre med avseendepå de tre utvärderingsfaktorer som använts, täckning, olikhet och riktighet.Resultatet är dock inte otvetydigt och ytterligare utvärdering krävs innandomänkunskap inom rekommendationsmodeller avfärdas.

3

Contents

1 Introduction 7

2 Background 9

2.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Project scope . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Theory 11

3.1 Recommender systems . . . . . . . . . . . . . . . . . . . . . . 11

3.1.1 Collaborative filtering . . . . . . . . . . . . . . . . . . 11

3.1.2 Content-based recommendations . . . . . . . . . . . . . 12

3.2 Evaluating recommender systems . . . . . . . . . . . . . . . . 13

3.2.1 Evaluation metrics used for this project . . . . . . . . . 15

3.3 Statistical data types . . . . . . . . . . . . . . . . . . . . . . . 15

3.4 Vector representation . . . . . . . . . . . . . . . . . . . . . . . 15

3.4.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.5 Similarity measures . . . . . . . . . . . . . . . . . . . . . . . . 16

3.5.1 Cosine similarity . . . . . . . . . . . . . . . . . . . . . 17

3.5.2 Item-item similarity matrix . . . . . . . . . . . . . . . 17

3.6 Wine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.6.1 Grapes . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.6.2 Wine regions . . . . . . . . . . . . . . . . . . . . . . . 19

3.6.3 Year of production . . . . . . . . . . . . . . . . . . . . 20

3.7 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.7.1 Recommender Engine . . . . . . . . . . . . . . . . . . . 21

4

3.7.2 The website . . . . . . . . . . . . . . . . . . . . . . . . 21

4 Method 22

4.1 The bag-of-words based recommender system . . . . . . . . . 22

4.1.1 Similarity score distribution . . . . . . . . . . . . . . . 22

4.2 The domain knowledge based recommender system . . . . . . 24

4.2.1 Developing the customized similarity measure . . . . . 24

4.2.2 Grape similarity score . . . . . . . . . . . . . . . . . . 24

4.2.3 Wine region similarity score . . . . . . . . . . . . . . . 24

4.2.4 Production year similarity score . . . . . . . . . . . . . 26

4.2.5 Weighting and normalizing similarity scores . . . . . . 26

4.2.6 Similarity score distribution . . . . . . . . . . . . . . . 27

5 Data processing 30

5.1 Preprocessing the data . . . . . . . . . . . . . . . . . . . . . . 30

5.1.1 The filtering process . . . . . . . . . . . . . . . . . . . 32

5.2 The processed dataset . . . . . . . . . . . . . . . . . . . . . . 32

5.3 Cleaning data features . . . . . . . . . . . . . . . . . . . . . . 34

6 Training and evaluating the recommender systems 35

6.1 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6.2 Making recommendations . . . . . . . . . . . . . . . . . . . . 35

6.3 Evaluating the algorithm . . . . . . . . . . . . . . . . . . . . . 36

6.3.1 Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6.3.2 Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6.3.3 Correctness . . . . . . . . . . . . . . . . . . . . . . . . 37

7 Result 39

5

7.1 Performance of weighted setups for the knowledge based model 39

7.2 Average similarity, coverage and diversity . . . . . . . . . . . . 40

7.3 Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

8 Discussion 44

8.1 The bag-of-words based recommender system . . . . . . . . . 44

8.2 The domain knowledge based recommender system . . . . . . 45

9 Conclusions 46

9.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

A Appendix A 52

B Appendix B 56

C Appendix C 58

D Appendix D 60

6

1 Introduction

Today, machine learning and data mining are integrated in many of our dailyroutines. Data is gathered when we are googling, shopping online, listeningto music, watching streaming TV services or scrolling through social media.To sort through these large amounts of data in order to find patterns orcorrelations, one can use data mining.

As users of these online services, we constantly have to filter through hugeamounts of data to find items that are relevant to us. To improve userexperience, and to improve sales, most of these services use so called RecommenderSystems (RS) to help the user navigate the information overload.[7]

Recommender systems today are, in general, built on either similarity betweenusers or similarity between items.[8] To determine whether two users or itemsare similar, one needs to use some kind of similarity measure.[9] In Figure 1,we see two well-known examples of recommender systems based on similaritybetween users, from Spotify and Netflix.

(a) Spotify recommender for radio stations

(b) Netflix recommender for series

Figure 1: Two examples of recommender systems for streamingservices

7

This project will focus on the type of RS where item-item similarity determinesthe recommendations. In most such systems, items are commonly representedby a number of features and the engine simply checks whether a feature ortag is present or not, and does not consider inter-feature similarities. E.g. amovie recommender simply checks whether or not a movie is a comedy, anddoes not consider a drama movie more similar than a horror film. A similarproblem is also present in free text analysis, where the RS does not recognizesynonyms.[2]

For recommender systems that base the recommendations on items that theuser previously liked, one big problem is overspecialization. Overspecializationmeans that the RS only recommends items that are very similar to thepreviously liked ones, but not new and unexpected to the user. In order foran RS to be of value for a user, it has to be able to recommend items that theuser has not yet discovered, and could not easily discover by themselves.[3]

To combat these challenges, one solution could be to implement domainknowledge in the similarity measure of the RS, thus making it possible torecommend items that are similar, but not necessarily in the same genre. Thishypothesis is tested in this project, and require a more complicated systemand more work in the implementation phase.

8

2 Background

2.1 Problem formulation

The purpose of this project is to investigate whether implementing domainknowledge in the similarity measure can improve the performance of arecommender engine. The goal is to develop a similarity measure that isspecified to a certain dataset, and compare its performance to that of anexisting algorithm for content based recommender systems.

2.2 Project scope

Due to the time constraints, the scope of the project will be very specific.The RS will take one single item as input, and based on that provide the userwith the top ten recommendations for that input.

Three performance measures will be studied;

• Coverage: How many of the items in the dataset are actually recommendedto the user for a set number of randomized inputs?

• Diversity : How different are the recommended items for a certain input?

• Correctness: Which of the two models can present the most relevantrecommendations to the user?

The project’s main focus is the similarity measure, and the impact it hason the RS. Other important performance measures, such as usability andsystem-centric factors, will not be tested. However, some computationalaspects will be taken into consideration in the discussion section of the report.

2.3 Related work

Recommender systems are very common in different online services today,where they play a big part in helping the user navigate huge amounts ofdata. Since these systems are so essential, a lot of research has been done inthe field. Earlier work, up to 2007, mostly focused on recommender systemsapplied to shopping and movie recommendations, but since 2007, the scope

9

has expanded to include other applications such as documents and musicamong others.[10]

Domain knowledge is introduced in systems to improve personalization ofrecommendations, and is also used to improve cross-domain recommendations.Two main approaches that are introduced by several authors are case basedRS:s and an RS model similar to content based systems. For the case basedapproach, the user enters some problem description, which is then comparedto other cases stored in a data base. The most similar case is then presentedas the solution to the user. This model is presented by several other authors,such as Khan et al.[11], Chattopadhyay et al. [12], among others.

The work of Khan et al. resulted in a knowledge based recommender system(KBRS) for diet menus called MIKAS, which can be used for instance in ahospital, where many people have dietary restriction due to different healthcondition.[11] Chattopadhyay et al. built a KBRS for medical diagnosis,specifically PMS. For each new case, medical experts are presented with ksimilar cases, and if the result is satisfactory, the case is added to the casebase.[12]

Several authors also developed systems similar to CBRS, where a similaritymeasure relates items in a knowledge base to a user in the user base. Towle etal. developed an explicit user and product description model, where insteadof using implicated user models based on user ratings, items and users areexplicitly labeled.[13]

A model presented by Ghani et al. uses a knowledge base with items andattributes associated to them. The feature extraction was done using textlearning methods, and are supposed to give a semantic context to the itemand thus also to the user profile.[14] Another similar approach was doneby Martínez et al., where the aim was to find user preference model whichuses a qualitative rather than a quantitative measure. The user profile isinferred from an example of an item that the user likes, and then the RS findsmatching items in the knowledge base.[15]

10

3 Theory

This section describes the techniques and models used in the project, howthey work and why they were used. The theory behind recommender systemsand similarity measures is explained, and some background information aboutwine characteristics used in this project is described.

3.1 Recommender systems

Data mining has a wide range of applications, everything from research todirected ads. Another example is recommendations, to find new content thatmight be of interest to a specific user.

The need for recommender systems can be derived from the so called long tailissue. Compare a physical record store to a streaming web service, the rangeof available items in a physical store has more constraints than its onlinedito, such as shelf space. Thus, in the store you might find a couple thousandrecords, while for example Spotify has millions of titles.[16] So now, with ouronline presence, even extremely niched items have a place, as long as theycan be presented to the relevant users.[17]

For recommender systems (RS), the available data usually contains a set ofusers and a set of items.[4] There are three RS techniques in particular:

1. Collaborative filtering (CF): bases recommendations on the ratings ofan item i by a set of users that are similar to the user u.

People who liked this also liked...People with preferences similar to you also liked...

2. Content-based (CB) methods use the features of an item to makerecommendations.

If you like this you might also like...

3. Hybrid approaches use both methods (1) and (2) combined.

3.1.1 Collaborative filtering

Recommender systems based on collaborative filtering usually only need accessto user rating data in order to provide recommendations, no other information

11

about users or items are needed. This is the most common type of RS, and awell researched field.[18] In 2006, Netflix held a $1 million competition forthe public to improve their RS, thus giving scientists and researchers accessto a real, large data set with over 100 million ratings.[19]

Collaborative filtering algorithms can in turn be divided into subcategories.Memory-based CF algorithms memorize the full matrix of user-item ratings,and uses that to make predictions. Model-based algorithms, on the otherhand, use the user-item matrix to create a parametrized prediction modelwhich is then used to make predictions.[20]

User-based models are the most popular kind of memory-based algorithms.They identify a set of users that are similar to the active user, the so calledneighborhood, and use their ratings to predict whether the active user willlike new items or not.[18] Item-based CF models similarly look at the set ofitems that the active user has previously rated and identifies the set of itemsthat are most similar to a specific new item. The model takes the averageof the user’s ratings on these selected items to predict what score the itemwould get.[21]

One of the most prominent advantages of using a CF model is the transferability.As long as the model has access to user-item ratings it can make predictions,regardless of the diversity of the present items. The same models can be usedin many contexts, both for movie recommendations as well as in e-commerce.Common problems for CF- based models include the cold start issue, meaningthe difficulty in recommending items with no reviews, or recommending itemsto users that have not yet ranked any items. Another big issue is the sparsityof the user-item matrix for systems with many users and/or many items. Thisalso causes problems with regards to scalability.[21]

3.1.2 Content-based recommendations

For content-based filtering, recommendations are commonly based on theuser profile, a register of the user’s previous preferences. The algorithm findsthe set of items that are most similar to item’s that the current user has givengood ratings.[2] Since this project is not tested online, the user profile willconsist of a single, positively rated item, provided by the user.

The similarity measurement is based on some categories or keywords thatare assigned to the items. Which keywords or categories that are relevantdepends on the current user.[22] Content-based RS models can be summarizedin three basic steps:[2]

12

• Structuring data: In particular, data where item descriptions are inthe form of free text, has to go through some preprocessing before beingfed to the RS.

• Assembling user profile: In this project, the user profile will simplyconsist of the description of the one item given as input to the RS.For more advanced content-based RS models, the input in this stepis usually a number of items and their ratings, which then has to begeneralized.

• Filtering: The RS uses the user profile to generate similar items forrecommendations, either in the form of single items or ranked top lists.To do this, some similarity measure is used. The similarity measuredeveloped during this project is presented in Section 4.2.

Content based recommender systems solve some of the problems of collaborativefilter systems, as they can recommend items that do not have any recommendations,and also do not suffer from the rating sparsity problem. Also, content basedRS:s can quickly adapt to new preferences by the users, and since they donot require extensive information about the user they are easy to makesecure. Challenges with these models include relying on descriptive dataabout the items to make good recommendations, and that the system onlycan recommend items that are similar to items already rated by the user.[23]

3.2 Evaluating recommender systems

In Table 1, some variables to take into consideration when evaluating an RS arelisted. These are grouped into four different categories, recommendation-centric,user-centric, system-centric and delivery-centric.[22]

13

Table 1: Categories of variables for RS evaluation

Recommendation-centricCorrectness Compare made recommendations

to some set of recommendationsthat are considered correct

Coverage How well does the RS cover theitem or user set?

Diversity How dissimilar are therecommended items?

Recommender confidence How confident is the RS in itsrecommendations?

User-centricTrustworthiness How trustworthy are the

recommendations?Novelty How well does the RS find

recommendations that are new orunknown to the user?

Serendipity Does the system find surprisingbut good recommendations?

Utility What is the value gained form theRS for users?

Risk User risk associated to eachrecommendation for users

System-centricRobustness How well does the RS tolerate bias

or false information?Learning rate How fast can the RS assimilate

new information?Scalability How scalable is the RS?Stability Are the recommendations

consistent over time?Privacy Are there risks to user privacy?

Delivery-centricUsability Is the system user friendly?User preference How do users perceive the RS?

14

The recommendation-centric variables focus mostly on objective evaluationof the recommendations, while user-centric variables focus on how the userexperiences them. System-centric variables does not focus on the recommendations,but on the system itself and the robustness etc. The delivery-centric focus onhow the user interact with the recommendations, and whether the system iseasy to use.[22]

3.2.1 Evaluation metrics used for this project

For this project, online measures of engine performance or user satisfactionwith many users is not an alternative. The evaluation will focus on the threerecommendation-centric measures correctness, coverage and diversity, as wewant to investigate the effect of a customized similarity measure. Diversity andcoverage are objective measures, which can be tested offline while correctnesswill require online testing, as we do not have access to labeled training data.For more information on how these evaluation metrics were tested, see Section6.3.

3.3 Statistical data types

Most of the unprocessed data available for use in recommender systems is freetext data in the form of descriptions or reviews. However, statistical data cancontain two general kinds of variables. Nominal data is qualitative and can besymbols or names on things. Nominal data that take a fixed number of values,e.g. colors, is referred to as categorical. Numerical data is quantitative andusually represented by numbers. For numerical data, statistical informationsuch as mean and median values can be of interest, but for nominal data itusually is not meaningful.[24]

3.4 Vector representation

When determining similarity between items containing non-numerical features,converting the data to some numerical representation is a common choice ofmethod. One way to do this is to use vector representation.[25]

For free text data, the most commonly used method is bag-of-words (BOW),which uses word stems as representation where the order in which the wordsappear is assumed to have little importance. Thus, a text is represented by

15

each distinct word, wi, appearing in it. The value for each feature is thenumber of times that word appears in the text. Common limitations to thismethod includes excluding stop-words, such as "and" or "but", and wordswhich appear less times than some threshold value.[26]

3.4.1 Example

We have the sentences (1) "Adam likes apples" and (2) "Mary also likesapples". With BOW (but without stopwords and stemming), this would berepresented according to Equation 1.

(1) Adam likes apples ⇒ s1 = {Adam : 1, likes : 1, apples : 1}(2) Karen does not like apples⇒ s3 = {Karen : 1, does : 1, not : 1, like : 1, apples : 1}

(1)

To compare these sentences, we rewrite the vectors to contain all possiblewords, in this case s = {Adam,Karen, does, not, likes, like, apples}, seeEquation 2.

(1)s1 = {Adam : 1, Karen : 0, does : 0, not : 0, likes : 1, like : 0, apples : 1}⇒ s1 = [1, 0, 0, 0, 1, 0, 1](2)s2 = {Adam : 0, Karen : 1, does : 1, not : 1, likes : 0, like : 1,apples : 1}⇒ s2 = [0, 1, 1, 1, 0, 1, 1]

(2)

From here, these vectors can be compared using some similarity measure, e.g.cosine similarity.

3.5 Similarity measures

Both in CF and CB recommender engines, similarity measures play animportant part and can impact how well the RS performs. The most commonchoices, for CF models in particular, are cosine similarity or the Pearsoncorrelation coefficient.[27]

Each data type, i.e. nominal, numerical etc, has a different set of appropriatesimilarity measures. However, real data usually contains data of mixed types.Despite this, similarity measures for mixed data are relatively unexplored.[28]

16

Previous work mostly focus on similarity measures for clustering applications,such as a method where a preprocessing step converts all data into eithernumerical or nominal.[29]

Some research has been done in the field of similarity measures that applydomain knowledge or semantics to make the similarity measure "smarter".Semantic similarity measures has a wide range of applications in naturallanguage processing and related areas.[30] One example of an application iscross-domain recommendations, usable in for example e-commerce.[31]

3.5.1 Cosine similarity

When items can be represented as vectors, a common choice for measuringthe similarity of two vectors is to take the cosine of the angle between thevectors. This measure is called cosine similarity. [2] For two vectors u and v,the cosine similarity is defined as given by Equation 3.

cos(u, v) =u · v||u||||v||

(3)

3.5.2 Item-item similarity matrix

The item-item similarity matrix is central to the recommender systems inthis project. For a recommender engine with n items, the similarity matrixwill be a nxn-matrix. The matrix is built by calculating the similarity scoresbetween all possible pairs of items, as seen in Algorithm 1. Since the similarityscore between two items is permutable, similarity(i, j) = similarity(j, i),the similarity matrix will be symmetrical. Thus, similarity scores only need

17

to be calculated for half the matrix.

Algorithm 1: Assemble similarity matrix with n itemsData: Product datasetResult: Item-item similarity matrix, sim_mat

1 for all n items i do/* the similarity score for an item with itself is

always 1 */2 sim_mat(i,i) = 1;3 for all items j<i do4 calculate the similarity between item i and item j;5 set sim_mat(i,j) and sim_mat(j,i) to similarity(i,j);6 end7 end

3.6 Wine

Two of the factors that will mostly affect the taste and style of a wine are thegrapes that it is made of and the location where the grapes are grown.[32]Thus, these were two features of big importance for the similarity measuredeveloped in this project.

3.6.1 Grapes

No one knows exactly how many different grape varieties that exist today, butthere are at least several thousand. However, most modern wines use only afew dozen types in different constellations. Fine wines are said to originatefrom France, and thus most grapes are of French descent.[32]

Different aromas and flavors are associated with different grapes. Thesearomas can then be grouped together into flavor profiles such as spicy, round,high tannin, which in turn form style profiles like fruity, savory and sweet, forexample.[33]

18

Table 2: Grape varieties found in Systembolaget’s product datasetsorted in blue and green grapes

Blue grapes Green grapesAglianico AlbariñoBarbera Chardonnay

Cabernet Franc Chenin BlancCabernet Sauvignon Furmint

Carmenère GarganegaCorvina GewurztraminerGamay GodelloGrenache Grüner VeltlinerMalbec MarsanneMerlot Melon de Bourgogne/Muscadet

Mourvèdre MuskatNebbiolo Pinot Blanc

Negroamaro Pinot GrisNero d’Avola RieslingPinot Noir Sauvignon BlancPinotage SavagninPrimitvo SemillonSangiovese Solaris

Syrah/Shiraz TorrontésTannat Verdicchio

Tempranillo VermentinoTouriga Nacional Viognier

Zinfandel

The grapes used by Systembolaget are listed in Table 2, sorted by blue andgreen varieties.

3.6.2 Wine regions

Another important factor for the taste of a wine is the location at which thegrapes were grown. In Systembolaget’s dataset, both country and region oforigin are available features.

The most important factor for the growing region’s effect on the wine isclimate, which affects the general characteristics of a wine. Wine from colder

19

regions are often more acidic, crisp and light-bodied with flavors of green fruitsand herbs. Grapes from warmer regions tend to give full-bodied, bolder wineswith higher alcohol content and flavors of dark fruits. Other factors, such asthe soil quality, also has an effect on the wine, but in a much more subtlemanor.[32] Table 3 show the typical characteristics of wines from differentclimates. The data in the table was gathered by Jones, G.[34], and waspublished on the GuildSomm website, which is a website for sommeliers andwine professionals.[35]

Table 3: Characteristics of wines from different climate types

Wine Climate typecharacteristic Cool Intermediate Warm

Fruit Lean, tart Ripe, juicy Overripe,lush

White flavor notes Apple, pear Peach, melon Mango,pineapple

Red flavor notes Cranberry,cherry

Berry, plum Fig, prune

Body Light Medium Full

Acidity Crisp, tangy Integrated Soft,smooth

Alcohol Low to moderate Moderate to high High tovery high

Overall style Subtle, elegant Mediumintensity

Bold

3.6.3 Year of production

As weather conditions vary from year to year, so does the quality of the wine,and regions with more variations in weather are more affected. Thus, the yearin which a wine was made, or its vintage, is of importance for a wine’s quality.For example, if it rains late in the wine growing season, that can make grapeswatery and give less flavor and thus that vintage will be of less quality.[36]

20

3.7 Software

3.7.1 Recommender Engine

The code for the recommender engine developed in this project was writtenin Python 2.7. The following Python libraries were used:

• pandas

• scikit-learn

• numpy

• scipy

pandas is an open source library for Python, that provides tools for datastructures and data analysis. In particular, pandas provides structures tomanipulate and store data, so called Series, for 1D data, and DataFrames,for 2D data.[37]

3.7.2 The website

For the evaluation of the models, a simple website was built using the Pythonweb framework Flask. Flask allows for simple web applications to be easilydeveloped, by supporting extensions for database implementations etc, butnot containing this itself.[38]

21

4 Method

This section will describe the two models evaluated in this project and howtheir respective similarity matrices are assembled.

4.1 The bag-of-words based recommender system

This section will present the vector representation based model that wascompared to the knowledge based model, which is presented in Section 4.2.

In order to use a similarity measure for items with some non-numerical features,the data needs to be represented as vectors. There are several word2vec (wordto vector) algorithms available. For the first setup, using categorical and textdata represented by vectors, the SciKit-Learn class CountVectorizer is used,which converts text collections into a matrix of terms.

4.1.1 Similarity score distribution

To test how the similarity score distribution differs for the two tested models,we look specifically at the similarity scores for the item with article number2800. For the vector representation based RS, there are about 1500 itemswith a similarity score above 0, as can be seen in Figure 2. We can also seethat about 100 items has a similarity score that is higher than 0.4, whichcorresponds to ca three features having exact matches. Around 1000 itemshas a similarity score close to 0.14, which corresponds to one feature havingan exact match.

22

Figure 2: Similarity score distribution for item with article number2800 using the BOW approach with cosine similarity

A heat map of the similarity matrix can be seen in Figure 3. The brightdiagonal shows all items have a similarity score of 1 with themselves, whichis to be expected.

Figure 3: Similarity score heat map for all items in Systembolaget’sproduct data base using the BOW model with cosine similarity

23

4.2 The domain knowledge based recommender system

4.2.1 Developing the customized similarity measure

For the customized similarity measure developed during this project, domainknowledge is used to try to improve the precision of similarity in threefeatures; grapes, regions of origin and vintages. This section will explainhow these similarity measures are calculated and the data used to do this.These three features were chosen because there are a lot of information ofhow these features affect the taste of wine, and thus would be rather simpleto implement.

The features which do not have domain knowledge implemented will stillbe used, and for those features BOW with cosine similarity is applied. Theresults are formed by calculating a weighted average and stored the similaritymatrix. All similarity scores are normalized.

4.2.2 Grape similarity score

The first feature that is subject to a customized similarity score is Descriptionof content, where the grapes are specified. This is the first feature wheredomain knowledge is introduced, in the form of a tree map showing the flavorprofile for the different grapes. The tree map is derived from a wine mapmade by the wine enthusiast website Wine Folly.[39] The complete tree mapsfor red, white and sparkling wines can be found in Appendix A.

The similarity is calculated using the formula given by Equation 4

similarity =nodes in common

maximum possible nodes in common(4)

Example: Carmenere is a fruity red grape with 4/4 nodes in common withZinfandel, giving a similarity score of 1, but only 2/4 with Sangiovese,corresponding to a similarity score of 0.5.

4.2.3 Wine region similarity score

To calculate the similarity of wine regions, they are divided into three climategroups which will result in different flavor characteristics in the wine. The

24

climate types and their characteristics are shown in Table 3, in the Theorysection of the report.

Table 4: Wine regions divided into generalized climate types

Cool Intermediate WarmBurgundy Bordeaux Rhône ValleyChampagne Tuscany Southern ItalyLoire Valley Piedmont Southern SpainAlsace Portugal ArgentinaTriveneto Pacific Northwest

(USA)Greece

New Zealand Chile CaliforniaAustria AustraliaGermany South AfricaNorth centralSpain

MediterraneanSpain

CanadaNorthernSpain

A generalized result of the division of wine regions into climate zones ispresented in Table 4, and the full table can be seen in Appendix B. The tablewas constructed using data from Old [32] and the Systembolaget website[40].The similarity between two regions is then calculated using the matrix shownin Table 5.

Table 5: Table showing similarity measures between regions withdifferent climate types

Cool Intermediate WarmCool 1 0.5 0

Intermediate 0.5 1 0.5Warm 0 0.5 1

For example, the similarity between Champagne, cool climate, and RhôneValley, warm climate, is 0, and the similarity between Champagne and Moselin Germany, cool climate, is 1.

25

4.2.4 Production year similarity score

On Systembolaget’s webpage, there are tables with ratings on different vintagesfrom different well-known wine regions.[41] An example can be seen in Figure4, ratings for different vintages from French wine regions are displayed.

Figure 4: Excerpt from Systembolaget’s vintage rating table for wineregions in France

For the production year similarity measure, these tables were used to createaverage values for each region for the listed years and transferred them into acsv-file Appendix C. The similarity between two wines with given regions oforigin and vintages is then calculated as the difference in rating, as shown inEquation 5.

similarity = |ratingwine 1 − ratingwine 2| (5)

4.2.5 Weighting and normalizing similarity scores

The three similarity measures where domain knowledge is used are weightedequally to form a matrix, mod_mat. For each matrix value mod_mati,j,the RS checks how many of the three domain knowledge features k that arepresent for both item i and item j, sums their similarity score and normalizesthem. The formula is given by Equation 6, where n is the number of features

26

present. If n = 0, the matrix value is 0.

mod_mat(i, j) =

{1n

∑nk=1 similarityk(i, j), if n 6= 0

0, if n = 0(6)

We now have two matrices with normalized similarity scores, mod_mat andthe matrix with BOW cosine similarity scores, bow_mat. These matrices willbe added with different weights throughout the evaluation process, as shownin Equation 7.

sim_mat = wa ∗mod_mat+ wb ∗ bow_mat

wa = 0.05, 0.1, ..., 0.95

wb = 1− wa

(7)

4.2.6 Similarity score distribution

For the domain knowledge setup, using wa = 0.3, wb = 0.7, about 1600 itemshas a similarity score over 0, which can be seen in Figure 5. Most of theseitems have similarity scores between 0.1 and 0.3. Compared to the strictBOW model, the similarity scores are more evenly distributed.

27

Figure 5: Similarity score distribution for item with article number2800 using the weighted domain knowledge model. Weight of domain

knowledge similarity score is 0.3.

In Figure 6, we can see the heat map for all similarity scores for the domainknowledge based recommender system, also using wa = 0.3, wb = 0.7.Similarly to the bag-of-words based model, seen in Figure 3, we see thebright diagonal indicating all items have similarity score 1 with themselves.Here we can also see that the domain knowledge based RS has lower similarityscore in general, and that the patterns look different compared to Figure 3.Thus, we can conclude that the domain knowledge based model will probablygive slightly different recommendations from the bag-of-words based model.

28

Figure 6: Heat map for similarity scores using the weighted domainknowledge based model. Weight of domain knowledge similarity score

is 0.3.

29

5 Data processing

For this project, data from Systembolaget’s open API has been used. Thedataset was chosen since it contains several interesting features, and productsthat are known to the target group that will be evaluating the RS.

In this section, the preprocessing steps of the data will be discussed andstatistical information about the resulting dataset will be shown.

5.1 Preprocessing the data

The data contains parts of the product information, including type of product,a short free text description and price. The dataset used in this project is anexample of so called semistructured data. The data contains both featureswith restricted values, structured data, and free text fields, unstructureddata.[6]

30

Table 6: Features in the Systembolaget dataset sorted intofeatures selected to be used by the recommender system and

discarded features

Selected features Discarded featuresName Product numberName2 Product ID

Description of content PantProduct type Price per liter

Production year Sales start dateProducer Sales stopped

Region of origin Class of goodsStyle

PackagingSealingSupplier

Year of tastingAssortment

Text, assortmentOrganicEthical

Ethical brandingKosher

Price, incl. taxesAlcohol contentVolume in mlLand of origin

In Table 6, the features in the dataset are listed. Of the about 30 features inthe original dataset, seven were chosen to be used in the recommender system.For the modified RS model, customized similarity measures will be developedfor three features: Year of origin, Description of content, and Region of origin.These features were chosen as they have a big impact on the flavor of thewine, which will probably be the most interesting aspect for the user. Forthe other selected features, Name, Name2, Product type, and Producer, willbe measured with the BOW approach for the existing model as well as themodified model.

The original dataset from Systembolaget contains 18670 entries, includingitems that are not wine, but were sorted out and discarded. Duplicate entries

31

were also removed from the dataset, as well as items that are no longer forsale.

5.1.1 The filtering process

The following steps were taken for the data preprocessing:

• Remove special characters and Swedish letters åäö from dataset

• Remove items from other product categories than wine (from 18K to10K entries)

• Remove items that are no longer for sale

• Remove features that are not selected to be used by the RS

• Remove items that have no value for the Product Type feature (from10K to 3.8K entries)

5.2 The processed dataset

The dataset contains 3752 items with seven features after the preprocessing.In Figure 7, the density of the dataset can be seen. The features with themost missing data are Description of content and Year of origin.

Figure 7: Data density for 12 features in the dataset

32

There are three product types present in the processed dataset, red, white,and sparkling wine. Figure 8 shows the distribution of these product types.

Figure 8: Distribution of product types in the processedSystembolaget dataset

In Figure 9, we see the distribution of the styles of wine present in the dataset.Apart from an over-representation of white, dry wines, the dataset is relativelybalanced in that regard.

33

Figure 9: Distribution of styles present in the Systembolaget dataset

5.3 Cleaning data features

In order to use the BOW approach, some further processing is needed. Alltext needs to be in lower case, and names needs to be written without spaces.The processing steps, and which features they are applied to, are listed below:

• All features: All text in lower case.

• Features with names of producers, regions, grapes etc: Removespaces, e.g. côtes du rhône becomes côtesdurhône

• Description of content: Remove percentage signs

34

6 Training and evaluating the recommender systems

In this section, the process of training and evaluating the RS is presented, aswell as information about the evaluation metrics and how they were tested.

6.1 Training

The training process consists of building the similarity matrix for the RS. Thesimilarity matrix gathers all similarity scores between all items in the dataset.In the case of the free text implementation, the similarity matrix is builtusing the SciKit Learn functions CountVectorizer and cosine_similarity.The CountVectorizer function performs the BOW transformation from textto vectors described in Section 3.4.1, and the cosine_similarity functioncalculates the similarity between all items in the matrix.

For the customized RS, the similarity matrix is a weighted result of thesimilarity matrix containing the similarity scores calculated with the customizedmethod and a matrix built using BOW with cosine similarity.

For both models, new items in the dataset would require the similaritymatrices to be updated to make sure the similarities for the new item areadded, thus increasing the size of the matrices.

6.2 Making recommendations

The RS implemented in this project is a so called Top 10 RS, meaningthat the ten most similar items will be recommended to the user. Therecommendation process thus consists of finding the similarity scores with allother items, adding them to a list, and then sorting the list with the highestscores first. This is explained in Algorithm 2.

Algorithm 2: Make n recommendations for item iinput : item ioutput : n recommended items

1 get row i from similarity_matrix ;2 sort list highest to lowest score ;3 select items 2 : n+ 1 ; /* item 1 = item i */4 present result to user ;

35

6.3 Evaluating the algorithm

As previously mentioned, the evaluation metrics used in this project arecorrectness, coverage and diversity. The similarity score distribution and theaverage similarity for recommended items are also investigated. Coverage iscalculated by storing all items that has been recommended through a largenumber of runs, and dividing that with the total number of items in thedataset.

Average similarity, coverage and diversity are evaluated by running the RSn = 1000 times, and calculating average values, see Algorithm 3.

Algorithm 3: Evaluating the recommender system1 create the RS instance;2 randomly select n items;3 for all selected items do4 make recommendations;5 store mean similarity score of recommended items;6 calculate and store diversity;7 for all recommended items do8 if item not in recommended_list then9 add item to recommended_list

10 end11 end12 end13 calculate average similarity, coverage and diversity;

/* cov = length(recommended_items)/length(all_items) */

6.3.1 Coverage

When measuring the coverage of the system used in this project, we will lookat how many items of the total dataset is recommended when the RS is run1000 times. In Equation 8, Ir is the set of items that are recommended duringthe 1000 runs and I is the full set of items in the dataset.

coverage =|Ir||I|

(8)

36

6.3.2 Diversity

To measure the diversity of the recommended items, I am using the modifiedsimilarity measure developed for the customized recommender engine. Themeasure used is the average diversity between all recommended items.

For each item, k, in the set of recommended items I = {Ii}ni=1, the diversityis calculated as given in Equation 9, where n = 9 in our case with 10recommended items.

diversityk =1

n

∑Ii,i 6=k

(1− similarity(Ii, Ik)

)(9)

6.3.3 Correctness

For this project, there was no rating data to use in the evaluation process.To test the correctness, or the quality of the recommendations, two choicetesting will be used. The user is presented with recommendations from boththe modified model and the standard BOW model, as seen in Figure 10, andwill then be asked to fill in a questionnaire to answer which model made thebest recommendations.

Figure 10: Screenshot from the website implementation, showing howthe user is presented with recommendations from both RS models

The users will be asked two questions in the questionnaire:

37

1. Which model (A or B) makes the best recommendations?

2. Why?

What best recommendations means in this context is up to the user to decide,but the user is presented with some example factors to consider, such asrelevance or serendipity.

38

7 Result

7.1 Performance of weighted setups for the knowledgebased model

To test the best setup in terms of weighting the similarity score for the domainknowledge based recommender system, the model was run with several valuesfor the weights.

In Figure 11, we can see that the coverage and diversity is only slightly affectedby the changes in weights, while the average similarity strongly decreaseswhen the weights for the knowledge based similarity score increases. This isexpected, as this part of the similarity score in general is significantly lowerthan the bag-of-word part.

Figure 11: Average similarity, coverage and diversity vs weight offeatures where domain knowledge based similarity scores were applied

We also note a marginal increase in diversity as the domain knowledge’sweight is increased. However, this is probably not an indication that thedomain knowledge based model has better diversity in its recommendations,

39

but rather an effect of the fact that the similarity score is on average lowerfor this model.

Similarly, the changes in coverage is probably due to the randomization inthe tested items, rather than the changes in the weights.

7.2 Average similarity, coverage and diversity

In Figure 12, the average similarity scores for both the domain knowledgebased model and the bag-of-words based model are shown. There is a ratherlarge difference between the similarity scores of these two models, about 0.2,which in part can be explained by the fact that several of the features used forthe domain knowledge based similarity score have a rather high missing-rate,as seen in Figure 7, resulting in a lower average for that part of the similarityscore. However, the results follow a similar pattern for both models, loweringas the number of recommendations are increased. This is expected, as whenyou introduce more recommendations, some of them will have lower similarityscores, thus affecting the average value.

Figure 12: Average similarity for recommended items vs number ofrecommended items per run for vector representation standard model

and knowledge based model

We see a similar pattern in Figures 13 and 14, where average coverage anddiversity are shown. For coverage, it is intuitive that as the number of itemsrecommended in each run is increased, the proportion of items recommended

40

during 1000 randomized runs will increase. This is also what we see inFigure 13. We can also note that here too, the bag-of-words based modelperforms slightly better, with a score approximately 0.5 higher than thedomain knowledge based system.

Figure 13: Coverage for recommended items vs number ofrecommended items per run for the bag-of-words based model and the

domain knowledge based model

Similarly, the bag-of-words based model also performs slightly better in termsof diversity. In Figure 14, we can note that the average diversity is about0.015 lower for the domain knowledge based model.

41

Figure 14: Diversity for recommended items vs number ofrecommended items per run for the bag-of-words based model and the

domain knowledge based model

7.3 Correctness

The questionnaire was sent out in a Facebook group for wine tasting, to reachpeople who are quite well-versed in the subject in order to get qualitativeanswers. The results of the 24 responses, are shown in Figure 15. 13 people,or 57.1 %, preferred the recommendations of model A, which is the domainknowledge based model. Eleven people, 42.9 % preferred model B, which isthe bag-of-words based model.

42

Figure 15: Answers to questionnaire about recommendation qualitywhere A is the domain knowledge based model, and B is the

bag-of-words based model

The users were also asked why they preferred the model they answered.Six users named price as a factor in choosing the best recommendations,even though they were asked not to consider price a factor. Several userscommented on the fact that bag-of-words based model gave recommendationsthat were more similar to the original choice than the recommendations ofthe domain knowledge based RS. Two users also commented that the domainknowledge based model gave more varied recommendation, which these user’spreferred. All questionnaire answers are shown in Appendix D.

To summarize, the results from the questionnaire show no distinct advantageto either system. The comments also show that people prefer different kindsof recommendations.

43

8 Discussion

The results shown in Section 7.2 shows that the bag-of-words based modeloutperforms the domain knowledge based system with regards to averagesimilarity, coverage and diversity for this particular dataset. However, a factorthat could have significantly impacted these results is that the density ofdata for these three features is rather low. The current setup only takesinto consideration if one or two of the three features in question are missing,not all three. If all three features are missing for an item, this part of thesimilarity score will be 0, and thus affect the total similarity score.

A better diversity is expected from the bag-of-words based model, since thediversity measure use the domain knowledge based similarity measure forboth models. This choice was made in order to be able to make the measurecomparative for the two models. However, since the purpose of the domainknowledge similarity measure was to improve the similarity between wineswith different but similar characteristics, a lower diversity is expected.

The response rate of the questionnaire was rather low, but this was expected.The choice of sending it to a small target group with domain knowledge wasmade to assure that the answers would be qualitative, and thus can give agood indication of the results of a larger test.

The result of the questionnaire shows that no model performs significantlybetter than the other, and that users prefer different kinds of recommendations.Some users liked the model that presented them with a list of very similaritems, and some liked the model that gave diverse items. Also, price seems tobe a very important factor in choosing the best model, and thus should havebeen a feature in the recommender system similarity measure.

8.1 The bag-of-words based recommender system

As free text similarity is a well researched field in machine learning, thereare several available models available for users, one of them being theBOW approach. With Python’s Scikit Learn library, this model is easilyimplemented, thus making this approach very effortless and easy to use.

A big advantage in using a vector representation based model is that it isapplicable to any category of recommendations: the same model can betrained to recommend movies, wine or clothes. However, since this modellacks semantic understanding and domain knowledge, it will sometimes miss

44

important nuances in its recommendations. For this specific dataset, thisproblem is relatively limited as the features are mostly categorical, thussynonyms etc are not used.

Another advantage with the bag-of-words based model was that the trainingtime was much shorter, it only took a few seconds before the system wasready for use, compared to the domain knowledge based model, which tookseveral hours to train.

8.2 The domain knowledge based recommender system

Developing the domain knowledge based recommender system using thisparticular similarity measure and setup was relatively simple. The similaritymeasure is based on three features, grapes, region of origin, and year of origin,which are limited in varieties. That is, when new items are added to thedataset, these similarity measures will still be applicable in most cases. Insome cases new features might have to be added, but upkeep will probablynot be very time consuming.

One of the biggest drawbacks with this approach is the training time. As arather big matrix has to be put together, constructed with several calculatedsimilarity measures which are weighted together, it is a rather time consumingprocess, and with more items added to the dataset, training time will increase.However, as the focus of this project not was to build a computationallyefficient RS, there are several measures that can be taken to optimize thetraining process. One example is to construct similarity matrices for eachfeature, in order to only have to make the calculations once, and afterwardssimply look up a number in the matrix.

45

9 Conclusions

The aim of the project was to investigate whether implementing domainknowledge in the similarity measure of a recommender system would improvethe performance of the model. The dataset used in the project was Systembolaget’sproduct information dataset, filtered to only contain wines. A hybrid domainknowledge based recommender system was developed using domain knowledgefor three of seven features. This model was then compared to a recommendersystem using the bag-of-words approach with cosine similarity.

The results show that the approach used in this project did not have asignificant enough effect on the recommendations to motivate using such amodel for this specific dataset. However, missing values for the features usedby the domain knowledge based recommender system could have affected theresults, and further testing would be necessary in order to draw any conclusionsin general about using domain knowledge for wine recommendations.

9.1 Future work

Due to time constraints, only specific approaches and their affects couldbe investigated. Other interesting factors, such as trying weighting eachfeature individually to find the best composition and different constellationsof features and how they affect the result, could be part of a future extensionof the project.

After the user tests, it was clear that having the price as a feature in thesimilarity measure would have been a good idea, as many people madecomments about this even though they were asked not to consider price as afactor.

A problem in testing the performance of the models is that the knowledgebased recommender system does not perform as well when given input itemsthat does not contain the features used by the domain knowledge similaritymeasures. Testing the system only on items containing these features couldthus possibly affect the results, and this could be interesting to look at in thefuture.

Another important factor in building a good recommender system is to haveaccess to an accurate user profile. In this project, the user profile was builtbased on a single wine, and will thus not provide a very extensive base forrecommendations. It would be interesting to see how the two models used

46

in this project would perform if given access to more well-documented userpreferences.

47

References

[1] Internet Live Stats - Internet Usage & Social Media Statistics;. Availablefrom: http://www.internetlivestats.com/.

[2] Ricci F, Rokach L, Shapira B, Kantor PB. Recommender SystemsHandbook. 2011th ed. Springer;.

[3] Khusro S, Ali Z, Ullah I. Recommender Systems: Issues, Challenges,and Research Opportunities; 2016. p. 1179–1189.

[4] Pham MC, Cao Y, Klamma R, Jarke M. A Clustering Approach forCollaborative Filtering Recommendation Using Social Network Analysis;.

[5] Rifqi M, Benhadda H. Similarity measures for binary and numericaldata: a survey. International Journal of Knowledge Engineering and SoftData Paradigms. 2009 Jan;2009(10).

[6] Madani A, Boussaid O, Eddine Zegour D. Semi-structured documentsmining: a review and comparison. Procedia Computer Science.2013;2013(22).

[7] Lampropoulos AS, Tsichrintzes G, editors. Machine learning paradigms:applications in recommender systems. No. 92 in Intelligent systemsreference library. Cham: Springer; 2015. OCLC: 936126409.

[8] Leskovec J, Rajaraman A, Ullman J. Chapter 9 RecommendationSystems. In: Mining of Massive Datasets;. .

[9] Konstan JA, Riedl J. Recommender systems: from algorithms touser experience. User Modeling and User-Adapted Interaction. 2012Apr;22(1-2):101–123. Available from: http://link.springer.com/10.1007/s11257-011-9112-x.

[10] Hee Park D, Kyeong Kim H, Young Choi I, Kyeong Kim J. A literaturereview and classification of recommender systems research. ExpertSystems with Applications. 2012;2012(39).

[11] Khan AS, Hoffmann A. Building a case-based diet recommendationsystem without a knowledge engineer. Artificial Intelligence in Medicine.2003 Feb;27(2):155–179.

48

http://www.internetlivestats.com/

http://link.springer.com/10.1007/s11257-011-9112-x

http://link.springer.com/10.1007/s11257-011-9112-x

[12] Chattopadhyay S, Banerjee S, Rabhi FA, Acharya UR. A Case-BasedReasoning system for complex medical diagnosis. Expert Systems. 2013Feb;30(1):12–20. Available from: https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1468-0394.2012.00618.x.

[13] Towle B, Quinn C. Knowledge Based Recommender Systems UsingExplicit User Models. KnowledgePlanet.com; 2000.

[14] Ghani R, Fano A. Building Recommender Systems using a KnowledgeBase of Product Semantics. In: In 2nd International Conference onAdaptive Hypermedia and Adaptive Web Based Systems, Malaga; 2002..

[15] Martínez L, Barranco MJ, Pérez LG, Espinilla M. A KnowledgeBased Recommender System with Multigranular Linguistic Information.International Journal of Computational Intelligence Systems. 2008Aug;1(3):225–236. Available from: https://doi.org/10.1080/18756891.2008.9727620.

[16] Spotify;. Available from: https://www.spotify.com/se/.

[17] Celma O. Chapter 4 The Long Tail in Recommender Systems. In: MusicRecommendations and Discovery;. .

[18] Cheng W, Yin G, Dong Y, Dong H, Zhang W. CollaborativeFiltering Recommendation on Users’ Interest Sequences. PLOS ONE.2016;11(5):e0155739. Available from: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0155739.

[19] Netflix Prize: Home;. Available from: https://www.netflixprize.com/index.html.

[20] Lee J, Sun M, Lebanon G. A Comparative Study of CollaborativeFiltering Algorithms. arXiv:12053193 [cs, stat]. 2012 May;ArXiv:1205.3193. Available from: http://arxiv.org/abs/1205.3193.

[21] Sarwar B, Karypis G, Konstan J, Reidl J. Item-based collaborativefiltering recommendation algorithms. ACM Press; 2001. p. 285–295.Available from: http://portal.acm.org/citation.cfm?doid=371920.372071.

[22] Robillard M, Maalej W, Walker R, Zimmermann T. RecommendationSystems in Software Engineering. vol. 2014. Springer-Verlag BerlinHeidelberg;.

49

https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1468-0394.2012.00618.x

https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1468-0394.2012.00618.x

https://doi.org/10.1080/18756891.2008.9727620

https://doi.org/10.1080/18756891.2008.9727620

https://www.spotify.com/se/

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0155739

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0155739

https://www.netflixprize.com/index.html

https://www.netflixprize.com/index.html

http://arxiv.org/abs/1205.3193

http://portal.acm.org/citation.cfm?doid=371920.372071

http://portal.acm.org/citation.cfm?doid=371920.372071

[23] Isinkaye FO, Folajimi YO, Ojokoh BA. Recommendation systems:Principles, methods and evaluation. Egyptian Informatics Journal. 2015Nov;16(3):261–273. Available from: http://www.sciencedirect.com/science/article/pii/S1110866515000341.

[24] Han J, Kamber M, Pei J. Data Mining Concepts and Techniques. vol.2012. 3rd ed. Waltham: Morgan Kaufmann;.

[25] Le Q, Mikolov T. Distributed Representations of Sentences andDocuments;p. 9.

[26] Jin R, Zhou ZH, Zhang Y. Understanding bag-of-words model: Astatistical framework. International Journal of Machine Learning andCybernetics. 2010 Dec;2010.

[27] Georgiou O, Tsapatsoulis N. The Importance of Similarity Metricsfor Representative Users Identification in Recommender Systems. In:Papadopoulos H, Andreou AS, Bramer M, editors. Artificial IntelligenceApplications and Innovations. vol. 339. Berlin, Heidelberg: SpringerBerlin Heidelberg; 2010. p. 12–21. Available from: http://link.springer.com/10.1007/978-3-642-16239-8_5.

[28] S Ali D, Ghoneim A, Saleh M. Data Clustering Method basedon Mixed Similarity Measures:. SCITEPRESS - Science andTechnology Publications; 2017. p. 192–199. Available from:http://www.scitepress.org/DigitalLibrary/Link.aspx?doi=10.5220/0006245601920199.

[29] Parameswari P, Samath JA, Saranya S. Scalable Clustering Using RankBased Preprocessing Technique for Mixed Data Sets Using EnhancedRock Algorithm. 2015;p. 8.

[30] Mihalcea R, Corley C, Strapparava C. Corpus-based andKnowledge-based Measures of Text Semantic Similarity;.

[31] kumar V, Shrivastva KMP, Singh S. Cross Domain RecommendationUsing Semantic Similarity and Tensor Decomposition. ProcediaComputer Science. 2016;85:317–324. Available from: http://linkinghub.elsevier.com/retrieve/pii/S1877050916305877.

[32] Old M. Wine - A Tasting Course. vol. 2014. 2014th ed. London: DorlingKindersley;.

50

http://www.sciencedirect.com/science/article/pii/S1110866515000341

http://www.sciencedirect.com/science/article/pii/S1110866515000341

http://link.springer.com/10.1007/978-3-642-16239-8_5

http://link.springer.com/10.1007/978-3-642-16239-8_5

http://www.scitepress.org/DigitalLibrary/Link.aspx?doi=10.5220/0006245601920199

http://www.scitepress.org/DigitalLibrary/Link.aspx?doi=10.5220/0006245601920199

http://linkinghub.elsevier.com/retrieve/pii/S1877050916305877

http://linkinghub.elsevier.com/retrieve/pii/S1877050916305877

[33] Vindruvor - Lär dig om druvsorter | Systembolaget;. Availablefrom: https://www.systembolaget.se/fakta-och-nyheter/vin/druvsorter/.

[34] Climate, Grapes, and Wine - Gregory Jones - Articles -GuildSomm;. Available from: https://www.guildsomm.com/public_content/features/articles/b/gregory_jones/posts/climate-grapes-and-wine.

[35] GuildSomm;. Available from: https://www.guildsomm.com/about_us/who-we-are/.

[36] Why Vintage Variation Matters; 2012.Available from: http://winefolly.com/tutorial/why-you-need-to-know-about-vintage-variation/.

[37] Python Data Analysis Library — pandas: Python Data Analysis Library;.Available from: https://pandas.pydata.org/.

[38] Welcome | Flask (A Python Microframework);. Available from: http://flask.pocoo.org/.

[39] The Different Types of Wine (Infographic); 2012. Available from: http://winefolly.com/review/different-types-of-wine/.

[40] Här kan du orientera dig i vinvärlden | Systembolaget;. Availablefrom: https://www.systembolaget.se/fakta-och-nyheter/vin/vinkartboken/#/varlden.

[41] Årgångstabell vin | Systembolaget;. Available from: https://www.systembolaget.se/fakta-och-nyheter/vin/argangstabell/.

51

https://www.systembolaget.se/fakta-och-nyheter/vin/druvsorter/

https://www.systembolaget.se/fakta-och-nyheter/vin/druvsorter/

https://www.guildsomm.com/public_content/features/articles/b/gregory_jones/posts/climate-grapes-and-wine



https://www.guildsomm.com/about_us/who-we-are/

https://www.guildsomm.com/about_us/who-we-are/

http://winefolly.com/tutorial/why-you-need-to-know-about-vintage-variation/

http://winefolly.com/tutorial/why-you-need-to-know-about-vintage-variation/

https://pandas.pydata.org/

http://flask.pocoo.org/

http://flask.pocoo.org/

http://winefolly.com/review/different-types-of-wine/

http://winefolly.com/review/different-types-of-wine/

https://www.systembolaget.se/fakta-och-nyheter/vin/vinkartboken/#/varlden

https://www.systembolaget.se/fakta-och-nyheter/vin/vinkartboken/#/varlden

https://www.systembolaget.se/fakta-och-nyheter/vin/argangstabell/

https://www.systembolaget.se/fakta-och-nyheter/vin/argangstabell/

’

Appendix A: Grape similarity tree

Red wine grapes

red

fruity

tart cherry & cranberry

round

pinot noir

spicy/juicy

gamay

strawberry & cherry

spicy

zinfandel

barbera

carmenere

grenache

negroamaro

primitivo

black cherry & raspberry

high tannin

tempranillo

cabernet sauvignon

spicy

cabernetfranc

sangiovese

round

corvina

blueberry & blackberry

round

syrah

malbec

merlot

nero d’avola

high tannins

mourvedre

touriga nacional

petit sirah

1

A Appendix A

spicy

shiraz

savory

clay & cured meats

high tannins

nebbiolo

round

brunello

truffle & forest

spicy/juicy

pinotage

smoke, tobacco & leather

high tannins

aglianico

tannat

black pepper & gravel

spicy/juicy

montepulciano

sweet

2

White wine grapes

white

dry

light, grapefruit & floral

pinot blanc

verdicchio

vermentino

light, citrus & lemon

pinot gris/pinot grigio

melondebourgogne

albarino

grner veltliner

light, herbal & grassy

sauvignon blanc

rich, creamy & nutty

savagnin

chardonnay

godello

garganega (soave)

medium, perfume & floral

viognier

torrontes

semillon

furmint (tokaji)

marsanne

sweet

rich, tropical & honey

muskat

off-dry, apricots & peaches

chenin blanc

gewrztraminer

riesling

3

Sparkling wine grapes

sparkling

red

rose

white

dry, creamy & rich

vintage champagne

dry, light & citrus

champagne

pinot meunier

pinot noir

cava

macabeo

semi-sweet & floral

prosecco

glera

sweet, apricots & rich

4

Appendix B: Wine region climate table

Cool Intermediate WarmRioja Venetien ApulienChampagne Tokaj-Hegyalja CavaPfalz Piemonte LisboaBourgogne Bordeux KalifornienLoiredalen Castilla y Leon IstraMarlborough Valle Central Western CapeWellington Toscana RhonedalenMosel Region del Sur South AustraliaMorava Washington State Southeastern AustraliaRheingau VDLT Castilla CuyoEngland Marche Western AustraliaAlsace Emilia-Romagna Aconcagua

Nieder Osterreich AlicanteLombardiet Frankrike sydvast DouroRheinhessen Dobrogea Languedoc-RoussillonGotlands lan Terra Alta PrioratSkane lan Manchuela TrakienRibeira Sacra Abruzzerna SardinienCanterbury Beira ValenciaBaden La Mancha SicilienKalmar lan Costers del Segre SavoieVictoria Oregon KampanienKakheti region Vino Spumante Di

Qualita Del TipoAromatico

Serra Gaucha

Valdeorras Tasmanien TejoBurgenland Navarra Danube PlainJura Del-Balaton Ribera del DueroPodravina Vinos de Madrid CatalunyaBlekinge lan Rueda MontsantNew York State Minho PenedesRıas Baixas Arlanza BierzoLa Rioja Valais CarinenaMosel-Saar-Ruwer Central Otago PeloponnesosRhein Toro Hawke’s bayWurttemberg Nagy-Somloi New South WalesSodermanlands lan Jumilla Salta

1

B Appendix B

Sekt Friuli-Venezia-Giulia RapelNelson Ligurien Trentino-AltoadigeFranken Nahe EgerJamtlands lan VDLT de Murcia Terras Dosado

Gisborne ValdepenasUtiel-Requena KorsikaSomontano MaipoCampo de Borja MauleMalaga MediterraneeUmbrien BekaaBritish Columbia AtticaDuna-Tisza Kozi Toledo

RibeiroGetariako TxakolinaSopronPenınsula de SetubalSantoriniZnojmoPrimorskiLatiumAlentejoGolanhojdernaPatagonienKalabrienYeclaProvenceMakedonienCoquimboMoliseSalamancaValle dela OrotavaVale dos VinhedosPovardarjeSamosKretaPrimorska Hrvatska

2

Appendix C: Year of productionRegion 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003

South Australia 5 5 5 3 2 5 4 4 3 4 4 3 4

Champagne 4 4 4 4 2 3 4 4 3 4 4 4 3

Rhonedalen 4.50 3.50 4 4 3 4.50 4.50 3 4.50 4 5 4 3.50

Bourgogne 4.50 4.50 3 3.50 2.50 4 4.50 3 3 2.50 4 2.50 2.50

Bordeaux 4 3.50 3 3 4 4 4 3.50 3.50 3 5 3 4

Toscana 5 3 4 3 4 3 3 3 4 4 2 4 3

Piemonte 4 3 4 4 3 4 3 4 4 4 2 4 3

Marlborough 5 5 4 3 4 5 4

Vintage Port 5 4 3 3 5 4 4 3 4 3 5 3 4

Priorat 4 4 4 4 4 5 4

Rioja 5 3 3 4 4 5 4 3 4 3 4 4 3

Coastal Region 4 3 4 4 3 3 5

Mosel-Saar-Ruwer 5 3 3 4 3 4 5 4 4 3 4 4 3

Kalifornien 5 4 5 5 4 3 4 4 3 4 3 3 4

Wachau 4 2 4 4 4 3 4

Region

�1

C Appendix C

2002 2001 2000 1999 1998 1997 1996 1995 1994 1993 1992 1991 1990

4 4 3 3 5 3 4 3 5 3 3 4 4

5 1 2 4 3 2 5 4 2 3 2 2 5

1.50 4 4.50 5 5 3.50 3 5 4 3 2 3 5

4 2.50 3.50 4 3.50 4 5 4.50 3 3.50 3.50 3 4.50

3.50 4.50 3.50 4 3.50 3.50 4 4 3.50 2.50 2 2 5

2 4 3 4 4 5 3 5 3 4 1 3 5

2 4 4 3 5 5 4 4 2 3 2 3 5

2 3 4 1 3 4 2 3 5 1 4 4 2

3 5 3 4 4 3 4 5 5 3 3 3 4

4 5 3 4 4 4 4 4 4 4 4 3 5

3 5 3 4 3 5 4 5 4 3 4 5 4

South Australia

Champagne

Rhonedalen

Bourgogne

Bordeaux

Toscana

Piemonte

Marlborough

Vintage Port

Priorat

Rioja

Coastal Region

Mosel-Saar-Ruwer

Kalifornien

Wachau

Region

�2

Appendix D: Questionnaire answersTimestamp Vilken av rekommendationerna föredrar du? Varför?

5/7/2018 14:35:08 A5/7/2018 16:16:20 B5/8/2018 10:24:05 A Bra rekommendationer på båda men B rekommenderade ett tetra-vin. Nej tack!

5/8/2018 10:30:02 B

Jag förde in ett Cava på ca 120 kr (7610). A gav mest rekommendationer av champagne som var väldigt mycket dyrare. B gav rekommendationer på andra Cava. Hade kanske önskat något mer än enbart Cava-rekommendationer men B hade i alla fall en rimlig prisnivå jämför med mitt vin.

5/8/2018 14:35:58 BFör att B gav mig mer relevanta viner i samma prisklass. Jag valde en ganska dyr prestigechampagne och B lyckades pricka lite andra bra champagner i samma prisklass medans A rekommenderade ett mousserande vin får Italien tex.

5/8/2018 15:30:22 A Båda presenterade relevanta viner, men metod A gick mer i den riktning jag söker; äldre, dyrare och smakrika amarone. 5/8/2018 15:36:03 A Fylligt, smakrikt, från Côtes du Rhône i riktningen dyrare5/8/2018 15:39:51 B Sökte somrigt och friskt vitt i riktningen billigare, vilket B var bättre på uppskattar jag

5/15/2018 22:49:06 B Överstämmer met kvalitetsmäsdigt5/15/2018 22:50:03 B Mest relevant5/15/2018 22:51:54 B Bra variation5/15/2018 22:52:50 B Mest överraskande5/15/2018 22:53:17 B Mest överraskande5/23/2018 22:08:58 A Ligger nära ursprungsvalet5/23/2018 22:22:47 A Bättre priser ungefär lika kvalitet5/25/2018 10:58:48 A B gav två länkar som inte fungerade.5/25/2018 11:01:31 B Tyckte de vinerna var bättre.5/25/2018 11:02:56 B Viner som var anpassade för det jag var ute efter.5/25/2018 11:04:45 B mer anpassat.5/25/2018 11:04:57 A Viner som tilltalade.5/25/2018 13:39:14 A Bättre blandning på de rekommenderade vinerna, variation av land, variation druvor men också ett par med samma druva. Mums!5/25/2018 13:49:28 A Bättre blandning bland rekommendationerna.

5/29/2018 8:53:19 B Båda rekommendationerna dåliga då det blev bara vita viner, såvitt jag kunde se, utifrån ett rött5/29/2018 8:59:17 A Bara vita viner igen utifrån 2015, blev samma utifrån 2014

D Appendix D