hypothesis-based collaborative filtering
Embed Size (px)
TRANSCRIPT

Amancio Bouza, PhD DefenseUniversity of Zurich, SwitzerlandFebruary 10th, 2012
Hypothesis-Based Collaborative FilteringRetrieving Like-Minded IndividualsBased on the Comparison of Hypothesized Preferences


Asian Food
Italian Food

Asian Food
Italian Food

Online Retailers vs. B&M Retailers
Revenue from Long Tail: 25% at Amazon [Anderson, 2006]3
Amazon.com: 2.3 Mio[Brynjolfsson et al., 2003]
B&M: 40k-100k[Brynjolfsson et al., 2003]

Online Retailers vs. B&M Retailers
Revenue from Long Tail: 25% at Amazon [Anderson, 2006]3
Amazon.com: 2.3 Mio[Brynjolfsson et al., 2003]
B&M: 40k-100k[Brynjolfsson et al., 2003]
Information Overload
Overchoice

Recommender Systems are Essential for Welfare
Mitigate negative effects [Hinz et al., 2011]
Collective wisdom: collaborative filtering
Consumer welfare increase by $731 million (to $1.03 billion) [Brynjolfsson et al., 2003]
Book market: enhancement of consumer welfare 7-10 times more than through competition and lower prices [Brynjolfsson and Smith, 2000]
4

Recommender Systems are Essential for Welfare
Mitigate negative effects [Hinz et al., 2011]
Collective wisdom: collaborative filtering
Consumer welfare increase by $731 million (to $1.03 billion) [Brynjolfsson et al., 2003]
Book market: enhancement of consumer welfare 7-10 times more than through competition and lower prices [Brynjolfsson and Smith, 2000]
4
Essential for Economic and Public Welfare

Collaborative Filtering
5

Collaborative Filtering
5
Individuals which share similar preferences in the past will share similar preferences in the future.

Collaborative Filtering
5
Individuals which share similar preferences in the past will share similar preferences in the future.

Significance of sparsityPartial representation of preferencesAssessability of preference similarityIncompleteness of preferences
Issues of Common Rated Products: Cold-Start Problem
6

Significance of sparsityPartial representation of preferencesAssessability of preference similarityIncompleteness of preferences
Issues of Common Rated Products: Cold-Start Problem
6

Significance of sparsityPartial representation of preferencesAssessability of preference similarityIncompleteness of preferences
Issues of Common Rated Products: Cold-Start Problem
6

ThesisHypothesis-Based Collaborative Filtering

ThesisHypothesis-Based Collaborative Filtering

Evaluation & AnalysisPreference SimilarityPreference Modelling
Thesis Overview
8
HypothesizedPreference Modelling
32 Chapter 3. Conceptualization and Specification of Preferences
T
T
T
h
P
A...
T
h ... ... .........
Root
Test
Utility
Partial preference
Figure 3.1: Partial preferences are encoded as branches from the root to the leaf of the decision tree. Thenodes of a branch corresponds to tests of some product properties and the leaf corresponds to the utility ofthe product.
which we described in Eq. (2.4) and recapitulate at this point:
C ← arg maxck
P(C = ck)∏j
P(Aj|C = ck) (3.7)
This rule calculates the most probable utility based on the observed probability distribution ofP(C) and P(Aj|C).
The Naïve Bayes classification rule is, in fact, a linear combination of all conditional proba-bilities of P(C) and P(Aj|C). Since Naïve Bayes assumes strong (naïve) independence amongproperties, an individual’s preferences for product properties are likewise independent. Therefore,we can interpret each conditional probability P(C) ∗ P(Aj|C) as a hypothesized partial preferencewith the set of constraints consisting of the single constraint Aj as the condition and the mostprobable conclusion c as the utility:
C ← arg maxck
P(C = ck) ∗ P(Aj|C = ck) (3.8)
For the purpose of simplification, we can apply Bayes’s theorem to simplify the definition ofhypothesized partial preferences of Eq. (3.8) and write hypothesized partial preferences as:
C ← arg maxck
P(C = ck|Aj) (3.9)
To extract all hypothesized partial preferences, we compute for each attribute the mostprobable utility.
Grounded Theory
05
1015
20
05
1015
20
0.7
0.8
0.9
1
1.1
Effort
HypokNNNBCompSup
Visibility
MAE
MAE
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
Ontology-based Decision Tree Learner: SemTree
4.2 SEMTREE extension to the decision tree model 37
4.2.2 Injecting concept features to generalize from features
The logic of feature generalization by injecting concept features is presented in Figure 4.1. The
concepts that can be used for the concept feature generation are given by an ontology. We use
the rdfs:subClassOf and rdf:type properties which connect instances with concepts and concepts
among each other concepts.
Let’s denote Ii ∈ I1, · · · , In as the feature vector representation of item i. The associated
classification to item i is denoted as Ci ∈ C1, · · · , Cn. Further, D, J, S, E, A, L represent ontology
instances and U, Z, H, X classes and superclasses.
D
no no no ... yes no yes C2
no no yes ... yes no no C1
no yes yes ... no no no C1
yes no no ... no yes no C2
yes yes no ... no no no C1
J S E A L
U X
H
Z
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
Instances
Associated
Class
Ontology
I1
I2
I3
I4
In
Figure 4.1: SemTree uses generalization of single concepts to improve the classification
If we look at the two instances where the feature value for feature D is set to yes (I4 and In)
the instance is once classified to class C1 and the other time to C2. Therefore, the feature D does
not provide evidence for the classification of the instance.
The features J and S individually are similar to the case before and do not provide evidence
for the instance classification. Since J and S are instances of the ontology class U we can combine
(generalize) both features to the concept feature of U, which is used as decision node in the tree
only if it provides a greater information gain. To calculate the information gain, we have to set
the values of the concept feature for each instance. If one of the feature values for an instance is
set to yes the feature value of U is also set to yes. This is depicted in the figure with the grey
background. In boxes with grey background color, the feature value is set to yes, otherwise no.
In the next case we go a step further. Again, the features E, A, and L do not provide evidence
for the classification. However, if we look at the concept features X and H the information gain is
not greater than the one of the individual features and we discard the concept features and use
the features E, A, and L instead. In the example in Figure 4.1 the feature respectively concept
feature with the highest information gain is the concept feature of U and is therefore used as
decision node in the tree.
Empirical Study
sparsity degree90%80%70%60%50%40%30%20%10%0%
MA
E
1.025
1.000
0.975
0.950
0.925
0.900
0.875
0.850
0.825
0.800
0.775
0.750
Naïve BayesJ48SVDPcorrkNN
HypokNN-J48CompSemSup
HypokNN-NBCompSup
HypokNN-semiJ48Utilprob
HypokNN-NBUtilcorr
Page 3
Semantic Jaccard Index
HypothesizedPreference Similarity
Hypothesis Composition-Based Preference Similarity
50 Chapter 5. Hypothesized Preference Similarity
sij
hb1 hbj hbn
ha1
hai
ham
s11
smn
vb
va
c1 c2 c3 c4 c5
c1
c2
c3
c4
c5
Figure 5.1: The partial preference similarity matrix S represents the similarities between the hypothesizedpartial preferences of two individuals a and b
hb(g) as the similarity function sim(va ◦ vbT) : Rm×n → [0, 1] ⊂ R, which consolidates all partialpreference similarities in S. Hence, the theoretical framework of our algorithmic framework is:
sim�ha(g), hb(g)
�≡ sim(va ◦ vbT) (5.28)
Our algorithmic framework of hypothesis composition-based preference similarity consists oftwo components. One component is a method to compute the similarity of hypothesized partialpreferences to determine S and refers to Eq. (5.26). This component is presented in Section 5.3.1.
The other component is a method to consolidate the similarities of S and is refers to Eq. (5.28).This component is presented in Section 5.3.2.
5.3.1 Similarity of Hypothesized Partial Preferences
As defined in Chapter 3.2, a hypothesized partial preference hi(g) consists of a set of constraintspi and the assigned rating ci. Based on this, we define the similarity between two hypothesizedpartial preference ha
i (g) and hbj (g) as the similarity between both corresponding constraint set pa
iand pb
j combined with the similarity between both corresponding ratings cai and cb
j :
sim�ha
i (g), hbj (g)
�≡ sim(pa
i , pbj ) ∗ sim(ca
i , cbj ) (5.29)
As it is depicted in Figure 5.1, the similarity sim(pai , pb
j ) ∗ sim(cai , cb
j ) corresponds to the elementsij in the partial preference similarity matrix S, i.e., sij = sim(pa
i , pbj ) ∗ sim(ca
i , cbj ).
This algorithmic framework allows for any kind of similarity metrics to compute the similaritybetween constraint sets and the similarity between rating concepts. In the following, we proposetwo possible similarity metrics for computing sim(pa
i , pbj ) and a similarity metric for computing
Hypothesized Utility-Based Preference Similarity
FORMULAS FOR THE SOFTALK
AMANCIO BOUZA
User functionu(i) = ck
hypothesized User functionh(i) + ε(i) = ck
h : i �→ ck
hypothesized User functionh(i) → ck
hypothesized User functionha(i) → ck
hypothesized User functionhb(i) → ck
User-Based collaborative Filtering:
r̂aj = ra + κn�
b�=a
sim(a, b) ∗ (rbj − rb)
Normalization factor κ:
κ =1
n�
b�=a
sim(a, b)
Example calculation:
r̂a2 = 3 +0.875 ∗ (4− 3.66) + 0.25 ∗ (1− 2.33)
0.875 + 0.25
= 3 +0.298− 0.333
1.125= 2.969
User Model:ua(i) = ha(i) + ε(i)
User Model:u(i) = h(i) + ε(i)
1

EvaluationEmpirical Study

EvaluationEmpirical Study

Experimental SettingDataset: MovieLens 100k (quasi benchmark)
10 datasets with different degree of rating sparsity
Method: k-fold cross-validation (k=5)Performance metrics:
Rating prediction accuracy: MAE, RMSERelevance filtering quality: Precision, Recall, F1-score, MCC, AUC
Candidates for ComparisonHypothesis-Based :
11 hypothesis-based collaborative filtering methodsBaseline:
3 collaborative filtering: SVD, PcorrkNN, WoC4 content filtering
Statistical test: non-parametric Wilcoxon signed-rank test for dependent samples
alpha = 0.01
10

Experimental SettingDataset: MovieLens 100k (quasi benchmark)
10 datasets with different degree of rating sparsity
Method: k-fold cross-validation (k=5)Performance metrics:
Rating prediction accuracy: MAE, RMSERelevance filtering quality: Precision, Recall, F1-score, MCC, AUC
Candidates for ComparisonHypothesis-Based :
11 hypothesis-based collaborative filtering methodsBaseline:
3 collaborative filtering: SVD, PcorrkNN, WoC4 content filtering
Statistical test: non-parametric Wilcoxon signed-rank test for dependent samples
alpha = 0.01
10

ResultsHypothesis-Based candiate: HypokNN-NBCompSup0% sparsity:
MAE: 0.76 RMSE: 0.97Precision: 0.84Recall: 0.54AUC: 0.66
90% sparsity:MAE: 0.86RMSE: 1.09Precision: 0.75Recall: 0.52AUC: 0.56
significant
11
sparsity degree90%80%70%60%50%40%30%20%10%0%
MA
E
1.025
1.000
0.975
0.950
0.925
0.900
0.875
0.850
0.825
0.800
0.775
0.750
Naïve BayesJ48SVDPcorrkNN
HypokNN-J48CompSemSup
HypokNN-NBCompSup
HypokNN-semiJ48Utilprob
HypokNN-NBUtilcorr
Page 3

AnalysisGrounded Theory

AnalysisGrounded Theory

Analysis
Grounded Theory
PropertiesIndividual’s Effort: number of ratingsIndividual’s Attitude: rating meanIndividual’s Selectivity: rating std.dev.Product’s Visibility: number of ratingsProduct’s Popularity: rating meanProduct’s Polarization: rating std.dev.Performance: MAEPerformance difference between HypokNN-CF methods: difference of MAE
13
05
1015
20
05
1015
20
0.7
0.8
0.9
1
1.1
Effort
HypokNNNBCompSup
Visibility
MAE
MAE
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1

Comparison of CF methods
14
individual’s effort
prod
uct’s
vis
ibili
ty
80%
spar
sity
0% sp
arsit
y
0 5 10 15 200
2
4
6
8
10
12
14
16
18
20
Effort
Visi
bilit
y
dMAE
1
0
1
0 5 10 15 200
2
4
6
8
10
12
14
16
18
20
Effort
Visi
bilit
y
dMAE
4
3
2
1
0
1
2
3
4
HypokNNNBCompSup - PcorrkNN
0 5 10 15 200
2
4
6
8
10
12
14
16
18
20
Effort
Visi
bilit
y
dMAE
1
0
1
0 5 10 15 200
2
4
6
8
10
12
14
16
18
20
Effort
Visi
bilit
y
dMAE
4
3
2
1
0
1
2
3
4
HypokNNJ48CompSemSup - PcorrkNN

When Hypothesis-Based Collaborative Filtering
15
Minor Cold-Start
Minor Cold-Start
Individual Product
Effort Selectivity Visibility Popularity
low high low high low high low high
Individual
Product
Effort
Selectivity
Visibility
Popularity
low
high
low
high
low
high
low
high
+ ++---
+
--+
-+ +++
+
Verified by empirical evaluation
sparsity degree90%80%70%60%50%40%30%20%10%0%
MA
E
0.925
0.900
0.875
0.850
0.825
0.800
0.775
0.750
0.725
PcorrkNN
GT-HypokNN-NBCompSup
HypokNN-NBCompSup
GT-HypokNN-semiJ48Utilprob
HypokNN-semiJ48Utilprob
GT-HypokNN-NBUtilprob
HypokNN-NBUtilprob
Page 1

Amancio Bouza, PhD DefenseUniversity of Zurich, SwitzerlandFebruary 10th, 2012
Hypothesis-Based Collaborative FilteringRetrieving Like-Minded IndividualsBased on the Comparison of Hypothesized Preferences

Evaluation & AnalysisPreference SimilarityPreference Modelling

Evaluation & AnalysisPreference SimilarityPreference Modelling
HypothesizedPreference Modelling
32 Chapter 3. Conceptualization and Specification of Preferences
T
T
T
h
P
A...
T
h ... ... .........
Root
Test
Utility
Partial preference
Figure 3.1: Partial preferences are encoded as branches from the root to the leaf of the decision tree. Thenodes of a branch corresponds to tests of some product properties and the leaf corresponds to the utility ofthe product.
which we described in Eq. (2.4) and recapitulate at this point:
C ← arg maxck
P(C = ck)∏j
P(Aj|C = ck) (3.7)
This rule calculates the most probable utility based on the observed probability distribution ofP(C) and P(Aj|C).
The Naïve Bayes classification rule is, in fact, a linear combination of all conditional proba-bilities of P(C) and P(Aj|C). Since Naïve Bayes assumes strong (naïve) independence amongproperties, an individual’s preferences for product properties are likewise independent. Therefore,we can interpret each conditional probability P(C) ∗ P(Aj|C) as a hypothesized partial preferencewith the set of constraints consisting of the single constraint Aj as the condition and the mostprobable conclusion c as the utility:
C ← arg maxck
P(C = ck) ∗ P(Aj|C = ck) (3.8)
For the purpose of simplification, we can apply Bayes’s theorem to simplify the definition ofhypothesized partial preferences of Eq. (3.8) and write hypothesized partial preferences as:
C ← arg maxck
P(C = ck|Aj) (3.9)
To extract all hypothesized partial preferences, we compute for each attribute the mostprobable utility.
Grounded Theory
05
1015
20
05
1015
20
0.7
0.8
0.9
1
1.1
Effort
HypokNNNBCompSup
Visibility
MAE
MAE
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
Ontology-based Decision Tree Learner: SemTree
4.2 SEMTREE extension to the decision tree model 37
4.2.2 Injecting concept features to generalize from features
The logic of feature generalization by injecting concept features is presented in Figure 4.1. The
concepts that can be used for the concept feature generation are given by an ontology. We use
the rdfs:subClassOf and rdf:type properties which connect instances with concepts and concepts
among each other concepts.
Let’s denote Ii ∈ I1, · · · , In as the feature vector representation of item i. The associated
classification to item i is denoted as Ci ∈ C1, · · · , Cn. Further, D, J, S, E, A, L represent ontology
instances and U, Z, H, X classes and superclasses.
D
no no no ... yes no yes C2
no no yes ... yes no no C1
no yes yes ... no no no C1
yes no no ... no yes no C2
yes yes no ... no no no C1
J S E A L
U X
H
Z
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
Instances
Associated
Class
Ontology
I1
I2
I3
I4
In
Figure 4.1: SemTree uses generalization of single concepts to improve the classification
If we look at the two instances where the feature value for feature D is set to yes (I4 and In)
the instance is once classified to class C1 and the other time to C2. Therefore, the feature D does
not provide evidence for the classification of the instance.
The features J and S individually are similar to the case before and do not provide evidence
for the instance classification. Since J and S are instances of the ontology class U we can combine
(generalize) both features to the concept feature of U, which is used as decision node in the tree
only if it provides a greater information gain. To calculate the information gain, we have to set
the values of the concept feature for each instance. If one of the feature values for an instance is
set to yes the feature value of U is also set to yes. This is depicted in the figure with the grey
background. In boxes with grey background color, the feature value is set to yes, otherwise no.
In the next case we go a step further. Again, the features E, A, and L do not provide evidence
for the classification. However, if we look at the concept features X and H the information gain is
not greater than the one of the individual features and we discard the concept features and use
the features E, A, and L instead. In the example in Figure 4.1 the feature respectively concept
feature with the highest information gain is the concept feature of U and is therefore used as
decision node in the tree.
Empirical Study
sparsity degree90%80%70%60%50%40%30%20%10%0%
MA
E
1.025
1.000
0.975
0.950
0.925
0.900
0.875
0.850
0.825
0.800
0.775
0.750
Naïve BayesJ48SVDPcorrkNN
HypokNN-J48CompSemSup
HypokNN-NBCompSup
HypokNN-semiJ48Utilprob
HypokNN-NBUtilcorr
Page 3
Semantic Jaccard Index
HypothesizedPreference Similarity
Hypothesis Composition-Based Preference Similarity
50 Chapter 5. Hypothesized Preference Similarity
sij
hb1 hbj hbn
ha1
hai
ham
s11
smn
vb
va
c1 c2 c3 c4 c5
c1
c2
c3
c4
c5
Figure 5.1: The partial preference similarity matrix S represents the similarities between the hypothesizedpartial preferences of two individuals a and b
hb(g) as the similarity function sim(va ◦ vbT) : Rm×n → [0, 1] ⊂ R, which consolidates all partialpreference similarities in S. Hence, the theoretical framework of our algorithmic framework is:
sim�ha(g), hb(g)
�≡ sim(va ◦ vbT) (5.28)
Our algorithmic framework of hypothesis composition-based preference similarity consists oftwo components. One component is a method to compute the similarity of hypothesized partialpreferences to determine S and refers to Eq. (5.26). This component is presented in Section 5.3.1.
The other component is a method to consolidate the similarities of S and is refers to Eq. (5.28).This component is presented in Section 5.3.2.
5.3.1 Similarity of Hypothesized Partial Preferences
As defined in Chapter 3.2, a hypothesized partial preference hi(g) consists of a set of constraintspi and the assigned rating ci. Based on this, we define the similarity between two hypothesizedpartial preference ha
i (g) and hbj (g) as the similarity between both corresponding constraint set pa
iand pb
j combined with the similarity between both corresponding ratings cai and cb
j :
sim�ha
i (g), hbj (g)
�≡ sim(pa
i , pbj ) ∗ sim(ca
i , cbj ) (5.29)
As it is depicted in Figure 5.1, the similarity sim(pai , pb
j ) ∗ sim(cai , cb
j ) corresponds to the elementsij in the partial preference similarity matrix S, i.e., sij = sim(pa
i , pbj ) ∗ sim(ca
i , cbj ).
This algorithmic framework allows for any kind of similarity metrics to compute the similaritybetween constraint sets and the similarity between rating concepts. In the following, we proposetwo possible similarity metrics for computing sim(pa
i , pbj ) and a similarity metric for computing
Hypothesized Utility-Based Preference Similarity
FORMULAS FOR THE SOFTALK
AMANCIO BOUZA
User functionu(i) = ck
hypothesized User functionh(i) + ε(i) = ck
h : i �→ ck
hypothesized User functionh(i) → ck
hypothesized User functionha(i) → ck
hypothesized User functionhb(i) → ck
User-Based collaborative Filtering:
r̂aj = ra + κn�
b�=a
sim(a, b) ∗ (rbj − rb)
Normalization factor κ:
κ =1
n�
b�=a
sim(a, b)
Example calculation:
r̂a2 = 3 +0.875 ∗ (4− 3.66) + 0.25 ∗ (1− 2.33)
0.875 + 0.25
= 3 +0.298− 0.333
1.125= 2.969
User Model:ua(i) = ha(i) + ε(i)
User Model:u(i) = h(i) + ε(i)
1
Thank you!