top-n recommendations from implicit feedback leveraging linked open data
TRANSCRIPT
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Top-N Recommendations from Implicit Feedback
leveraging Linked Open Data
Vito Claudio Ostuni, Tommaso Di Noia, Eugenio Di Sciascio, Roberto Mirizzi
[email protected], [email protected], [email protected], [email protected]
Polytechnic University of Bari - Bari (ITALY)
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Outline
Introduction and motivation
SPrank: Semantic Path-based ranking Data model and Problem formulation
Path-based features
Learning the ranking function
Experimental Evaluation
Contributions and Conclusion
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Linked Open Data
• Initiative for publishing and connecting data on the Web using Semantic Web technologies;
• >30 billion of RDF triples from hundreds of data sources;
• Semantic Web done right [ http://www.w3.org/2008/Talks/0617-lod-tbl/#(3) ]
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Linked Open Data
• Initiative for publishing and connecting data on the Web using Semantic Web technologies;
• >30 billion of RDF triples from hundreds of data sources;
• Semantic Web done right [ http://www.w3.org/2008/Talks/0617-lod-tbl/#(3) ]
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Hong Kong in DBpedia
db:Hong_Kong
db:thumbnail
8134 triples
subject predicate
object
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Hong Kong in DBpedia
db:Hong_Kong
db:thumbnail
select * where { ?s dbpedia-owl:location <http://dbpedia.org/resource/Hong_Kong>. ?s dcterms:subject category:Skyscrapers_over_350_meters. }
Skyscrapers over 350 meters in Hong Kong?
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Hong Kong in DBpedia
db:category:Skyscrapers_over_350_meters)
db:International_Commerce_centre
db:thumbnail
db:Central_Plaza_(Hong_Kong)
db:location
db:Hong_Kong
db:thumbnail
dcterms:subject
db:thumbnail
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Motivation
Traditional Ontological/Semantic Recommender Systems:
• make use of limited domain ontologies;
• rely on explicit feedback data;
• address the rating prediction task.
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Motivation
Traditional Ontological/Semantic Recommender Systems:
• make use of limited domain ontologies;
• rely on explicit feedback data;
• address the rating prediction task.
But…
• a lot of structured semantic data on the Web;
• Implicit feedback are easier to collect;
• Top-N Recommendations is a more realistic task.
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Motivation
Traditional Ontological/Semantic Recommender Systems:
• make use of limited domain ontologies;
• rely on explicit feedback data;
• address the rating prediction task.
But…
• a lot of structured semantic data on the Web;
• Implicit feedback are easier to collect;
• Top-N Recommendations is a more realistic task.
Challenge:
• compute Top-N Item Recommendations from implicit feedback exploiting the Web of Data.
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Our approach
• Usage of structured semantic data freely available on the Web (Linked Open Data) to describe items
DBpedia ontology
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Our approach
• Analysis of complex relations between the user preferences and the target item (extraction of path-based features)
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Our approach
• Analysis of complex relations between the user preferences and the target item (extraction of path-based features)
• Formalization of the Top-N Item recommendation problem from implicit feedback in a Learning To Rank setting
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Data model
I1 i2 i3 i4
u1 1 1 0 0
u2 1 0 1 0
u3 0 1 1 0
u4 0 1 0 1
Implicit Feedback Matrix Knowledge Graph ^
S
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Data model
Implicit Feedback Matrix Knowledge Graph ^
S
I1 i2 i3 i4
u1 1 1 0 0
u2 1 0 1 0
u3 0 1 1 0
u4 0 1 0 1
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Data model
Implicit Feedback Matrix Knowledge Graph ^
S
I1 i2 i3 i4
u1 1 1 0 0
u2 1 0 1 0
u3 0 1 1 0
u4 0 1 0 1
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Problem formulation
^
^
*
^*
{ | 1}
{ | 0}
, ( )
uiu
uiu
u u
D
ui
uiui u u
u
I i I s
I i I s
I I
x
TR x s i I I
Feature vector
Set of irrelevant items for u
Set of relevant items for u
Training Set
Sample of irrelevant items for u
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Path-based features
1
# ( )( )
# ( )
uiui D
ui
d
path jx j
path d
path acyclic sequence of relations ( s , .. rl , .. rL )
Frequency of pathj in the sub-graph related to u ad i
• The more the paths, the more the item is relevant. • Different paths have different meaning. • Not all types of paths are relevant.
u3 s i2 p2 e1 p1 i1 (s, p2 , p1)
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
u1
i1
u2
u3
i2
i3
e1
e3
e4
e2
e5
u4
i4
Path-based features
xu3i1 ?
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
u1
i1
u2
u3
i2
i3
e1
e3
e4
e2
e5
u4
i4
Path-based features
path1 (s, s, s) : 1
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
u1
i1
u2
u3
i2
i3
e1
e3
e4
e2
e5
u4
i4
Path-based features
path1 (s, s, s) : 2
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
u1
i1
u2
u3
i2
i3
e1
e3
e4
e2
e5
u4
i4
Path-based features
path1 (s, s, s) : 2 path2 (s, p2, p1) : 1
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
u1
i1
u2
u3
i2
i3
e1
e3
e4
e2
e5
u4
i4
Path-based features
path1 (s, s, s) : 2 path2 (s, p2, p1) : 2
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
u1
i1
u2
u3
i2
i3
e1
e3
e4
e2
e5
u4
i4
Path-based features
path1 (s, s, s) : 2 path2 (s, p2, p1) : 2 path3 (s, p2, p3, p1) : 1
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Path-based features
path1 (s, s, s) : 2 path2 (s, p2, p1) : 2 path3 (s, p2, p3, p1) : 1
3 1
3 1
3 1
2(1)
5
2 (2)
5
1 (3)
5
u i
u i
u i
x
x
x
u1
i1
u2
u3
i2
i3
e1
e3
e4
e2
e5
u4
i4
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Learning the ranking function
Point-wise Learning To Rank
• Simplest LTR technique
• Very effective in practice (Yahoo! Learning to Rank Challenge best
solution was extremely randomized trees in a standard regression setting)
: Df Learn a prediction function ^
( ) uiuif x ss.t.
Assumption: if f is accurate, then the ranking induced by f should be close to the desired ranking
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
BagBoo
BagBoo: a scalable hybrid bagging-the-boosting model [D. Pavlov, A. Gorodilov, C. Brunk CIKM2010]
• Combination of Random Forest (Bagging) and Gradient Boosted Regression Trees (Boosting)
• Combines the high accuracy of gradient boosting with the resistance to overfitting of random forests
For b=1 to B:
fb learn GBRT from Tb
bT TR
1
1 B
b
b
f fB
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Evaluation Methodology
• Top-N Item recommendation task
• Evaluation methodology similar to: [Cremonesi, Koren and Turrin, RecSys 2010]
• Evaluation with different user profile size: User
profile Test Set
10
5
User profile Test Set 10
User profile Test Set
……
given 5
given 10
given All
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Datasets
Subset of Movielens mapped to DBpedia 3,792 users 2,795 movies 104,351 entities
Subset of Last.fm mapped to DBpedia
852 users 6,256 artists 150,925 entities
Mappings http://sisinflab.poliba.it/mappingdatasets2dbpedia.zip
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Evaluation of different ranking functions
0
0,1
0,2
0,3
0,4
0,5
0,6
given 5 given 10 given 20 given 30 given 50 given All
reca
ll@5
user profile size
Movielens
BagBoo
GBRT
Sum
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Evaluation of different ranking functions
0
0,1
0,2
0,3
0,4
0,5
0,6
given 5 given 10 given 20 given All
reca
ll@5
user profile size
Last.fm
BagBoo
GBRT
Sum
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Comparative approaches
• BPRMF, Bayesian Personalized Ranking for Matrix Factorization
• BPRLin, Linear Model optimized for BPR (Hybrid alg.)
• SLIM, Sparse Linear Methods for Top-N Recommender Systems
• SMRMF, Soft Margin Ranking Matrix Factorization
MyMediaLite
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Comparison with other approaches
0
0,1
0,2
0,3
0,4
0,5
0,6
given 5 given 10 given 20 given 30 given 50 given All
reca
ll@5
user profile size
Movielens
SPrank
BPRMF
SLIM
BPRLin
SMRMF
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Comparison with other approaches
0
0,1
0,2
0,3
0,4
0,5
0,6
given 5 given 10 given 20 given All
reca
ll@5
user profile size
Last.fm
SPrank
BPRMF
SLIM
BPRLin
SMRMF
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Contributions
SPrank: Semantic Path-based ranking
Combination of semantic item descriptions from the Web of Data and implicit feedback
Mining of the semantic graph using path-based features
Learning To Rank setting
Future Work:
Deeper analysis of the path-based features
Usage of other Learning To Rank approaches
RecSys 2013 – 7th ACM Conference on Recommender Systems October 12-16, 2013 Hong Kong, China
Q & A
A Little Semantics Goes a Long Way.
Hendler Hypothesis