wise2017 - factorization machines leveraging lightweight linked open data-enabled features for top-n...
TRANSCRIPT
Factorization Machines Leveraging Lightweight Linked Open Data-enabled Features for Top-N Recommendations
Guangyuan Piao, John G. Breslin Insight Centre for Data Analytics, National University of Ireland Galway
The 18th International Conference on Web Information Systems Engineering Moscow, Russia, 7-10th, October
Background
2
Linked Open Data (LOD) provides domain knowledge and rich Information about items
content-based recommender systems [source]: http://lod-cloud.net
• 1st class citizen in LOD cloud
• Structured information from Wikipedia
• 4.58 million things • 1,445,000 persons, 87,000 films etc.
Background
3
Linked Open Data (LOD) provides domain knowledge and rich Information about items
[source]: http://lod-cloud.net
knowledge base
Background Knowledge from DBpedia
4
Chase_films Auto_racing_films …
• Knowledge is represented as SPO triples • SPO: Subject ! Property ! Object
• Knowledge is freely accessible via a public SPARQL Endpoint
Background Knowledge from DBpedia
5
musicComposer
(Subject)
(Property)
(Object)
(Some) Related Work
• Semantic Similarity/Distance Measures • [Passant et al. ISWC’10, AAAI’10] • [Piao et al. SAC’16]
• Graph-based algorithms such as PageRank • [Musto et al. UMAP’16] • [Nguyen et al. WWW’15]
• Machine learning approaches • [Noia et al. RecSys’12], VSM + SVM classifier • [Noia et al. TIST’16], semantic paths + learning-to-rank (SPRank)
6
(Some) Related Work
• Semantic Similarity/Distance Measures • [Passant et al. ISWC’10, AAAI’10] • [Piao et al. SAC’16]
• Graph-based algorithms such as PageRank • [Musto et al. UMAP’16] • [Nguyen et al. WWW’15]
• Machine learning approaches • [Noia et al. RecSys’12], VSM + SVM classifier • [Noia et al. TIST’16], semantic paths + learning-to-rank (SPrank)
7
user-item interactions
item background knowledge
build a graph
extract features
feed to algorithms
SPARQL Endpoint
Combined Graph
8
Chase_films …
user-item interactions
item background knowledge
build a graph
extract features
feed to algorithms
SPARQL Endpoint
• Using lightweight LOD features from DBpedia • lightweight: directly obtained via SPARQL Endpoint
• Lightweight LOD features • Property-Object list (PO)
Proposed Approach: Features
9
user-item interactions
item background knowledge
SPARQL Endpoint
dbr:The_Godfather
dbr:Carlo_Savina
dbo:knownFor
dbr:Francis_Ford_Coppola
dbr:The_Godfather_Returns dbc:Gangster_films
dbo:series
dbo:director
dc:subject
feed to algorithms
• Using lightweight LOD features from DBpedia • lightweight: directly obtained via SPARQL Endpoint
• LOD features • Property-Object list (PO) • Subject-Property list (SP)
Proposed Approach: Features
10
user-item interactions
item background knowledge
SPARQL Endpoint
dbr:The_Godfather
dbr:Carlo_Savina
dbo:knownFor
dbr:Francis_Ford_Coppola
dbr:The_Godfather_Returns dbc:Gangster_films
dbo:series
dbo:director
dc:subject
feed to algorithms
• Using lightweight LOD features from DBpedia • lightweight: directly obtained via SPARQL Endpoint
• LOD features • Property-Object list (PO) • Subject-Property list (SP) • PageRank score (PR)
Proposed Approach: Features
11
user-item interactions
item background knowledge
SPARQL Endpoint
dbr:The_Godfather
dbr:Carlo_Savina
dbo:knownFor
dbr:Francis_Ford_Coppola
dbr:The_Godfather_Returns dbc:Gangster_films
dbo:series
dbo:director
dc:subject
feed to algorithms
• Factorization Machines (FMs)
• Optimization: Bayesian Personalized Ranking (BPR)
Proposed Approach: Algorithms
12
Proposed Approach
13
1 0 … 1 0 … 0.2 0.2 … 0.1 0 … 0.1
0 1 … 0 1 … 0.3 0.5 … 0 0.3 … 0.2
…
…
…
…
…
…
…
…
…
…
…
…
…
user item PO SP PR
1
0
…
x1
Feature vector x Target y
x2
• Overall features for Factorization Machines
• Movielens dataset for LOD-enabled recommender systems
• 80% for training set, and 20% for test set
Experimental Setup: Dataset
14
• P@N: the precision at rank N
• R@N: the recall at rank N
• nDCG@N: normalized Discounted Cumulative Gain
• MRR: Mean Reciprocal Rank
• MAP: Mean Average Precision
Experimental Setup: Evaluation Metrics
15
• PopRank: baseline approach
• kNN-item: item-based k-nearest neighbors algorithm
• BPRMF: matrix factorization with the BPR optimization
• SPRank: learning-to-rank using semantic paths based on LOD
• LODFM: our proposed approach
Experimental Setup: Compared Methods
16
Results
17
best tuned parameters: m=200, PO+PR
Model Analysis: Features (m=10)
18
Model Analysis: Dimensionality
19
Model Analysis: Dimensionality
20
• LODFM provides state-of-the-art performance
• Using FMs with lightweight LOD-enabled features • directly obtained via a public SPARQL Endpoint of DBpedia • without maintaining graph, and extracting features from it
• Useful features: Property-Object list & PageRank • Feature work
• investigate other lightweight LOD-enable features • evaluate in other domain dataset
Conclusions
21
22
Guangyuan Piao e-mail: [email protected] twitter: https://twitter.com/parklize slideshare: http://www.slideshare.net/parklize