movie recommendation with dbpedia - iir 2012

16
3rd Italian Information Retrieval Workshop (IIR 2012) Bari January 26, 2012 MOVIE RECOMMENDATION WITH DBPEDIA Politecnico di Bari Via Orabona, 4 70125 Bari (ITALY) Roberto Mirizzi, Tommaso Di Noia, Azzurra Ragone, Vito Claudio Ostuni, Eugenio Di Sciascio [email protected], [email protected] , [email protected], [email protected], [email protected]

Upload: roberto-mirizzi

Post on 18-Dec-2014

527 views

Category:

Technology


0 download

DESCRIPTION

Movie Recommendation with DBpediaRoberto Mirizzi, Tommaso Di Noia, Azzurra Ragone, Vito Claudio Ostuni, Eugenio Di Sciascio3rd Italian Information Retrieval Workshop (IIR 2012) - BariJanuary 26, 2012In this paper we present MORE (acronym of MORE than MOvie REcommendation), a Facebook application that semantically recommends movies to the user leveraging the knowledge within Linked Data and the information elicited from her profile. MORE exploits the power of social knowledge bases (e.g. DBpedia) to detect semantic sim- ilarities among movies. These similarities are computed by a Semantic version of the classical Vector Space Model (sVSM), applied to semantic datasets. Precision and recall experiments prove the validity of our ap- proach for movie recommendation. MORE is freely available as a Facebook application.

TRANSCRIPT

Page 1: Movie Recommendation with DBpedia - IIR 2012

3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012

MOVIE RECOMMENDATION WITH DBPEDIA

Politecnico di Bari

Via Orabona, 4

70125 Bari (ITALY)

Roberto Mirizzi, Tommaso Di Noia, Azzurra Ragone, Vito Claudio Ostuni, Eugenio Di Sciascio [email protected], [email protected] , [email protected], [email protected], [email protected]

Page 2: Movie Recommendation with DBpedia - IIR 2012

3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012

Outline

DBpedia: a nucleus for a Web of Open Data Social knowledge bases for similarity detection

Semantic Vector Space Model Vector Space Model adapted to RDF graphs

MORE: More than Movie Recommendation Content-based recommendation in action

Evaluation Precision and Recall experiments with MovieLens

Conclusion

Page 3: Movie Recommendation with DBpedia - IIR 2012

3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012

What is Linked Data?

Linked Data is about using the Web to connect related data that wasn't previously linked, or using the Web to lower the barriers to linking data currently linked using other methods. More specifically, Wikipedia defines Linked Data as “a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF.” [www.linkeddata.org]

Page 4: Movie Recommendation with DBpedia - IIR 2012

3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012

DBpedia: a Nucleus for a Web of Data (i)

Page 5: Movie Recommendation with DBpedia - IIR 2012

3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012

DBpedia: a Nucleus for a Web of Data (ii)

Let’s use all this knowledge to build smarter content-based recommender

systems

The DBpedia knowledge base currently describes more than 3.64 million things, highly interconnected in the RDF graph.

Page 6: Movie Recommendation with DBpedia - IIR 2012

3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012

Social KBs for similarity detection

Ocean’s Eleven

George Clooney

Brad Pitt

Ocean’s Twelve

Steven Soderbergh

Catherine Zeta-Jones

2000s crime films

American criminal comedy films

Crime films

Crime

Page 7: Movie Recommendation with DBpedia - IIR 2012

3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012

Semantic Vector Space Model (i)

[http://en.wikipedia.org/wiki/File:Vector_space_model.jpg]

Quick recap on Vector Space Model Vector Space Model is an algebraic model for representing both text documents and queries as vectors of index terms wt,d that are positive and non-binary.

1, 2, ,, ,...,T

d d d N dv w w w

, ,t d t d tw tf idf

,

,

,

t d

t d

k dk

ntf

n

, ,1

2 2

, ,1 1

( , )

N

i j i qj q ij

N Nj i j i qi i

w wd dsim d q

d q w w

' 'logt

Didf

d D t d

Page 8: Movie Recommendation with DBpedia - IIR 2012

3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012

Each resource (movie) is expressed as a tensor in a multi-dimensional space where each dimension corresponds to a specific property of the considered datasets (e.g., starring, subject/broader, director, genre, …)

Semantic Vector Space Model (ii)

Ocean’s Eleven

George Clooney

Steven Soderberg 2000s crime films

Crime starring

director subject/broader

genre

Ocean’s Twelve

Brad Pitt Catherine Zeta-Jones

Crime films American criminal…

Oce

an’s

Ele

ven

Ge

org

e C

loo

ney

Stev

en

So

de

rbe

rg

20

00

s cr

ime

film

s

Cri

me

Oce

an’s

Tw

elv

e

Bra

d P

itt

Cat

he

rin

e Z

eta-

Jon

es

Cri

me

film

s A

me

rica

n c

rim

inal

Vector Space Model applied to RDF graphs

Ocean’s Eleven Ocean’s Twelve

starring

Ge

org

e C

loo

ney

B

rad

Pit

t C

ath

eri

ne

Zet

a-Jo

ne

s

Page 9: Movie Recommendation with DBpedia - IIR 2012

3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012

STARRING George

Clooney [gc] (38 movies)

Catherine Z. Jones [czj] (22 movies)

Brad Pitt [bp]

(35 movies)

Ocean’s Eleven [o11] (13 actors)

Ocean’s Twelve [o12] (15 actors)

STARRING George

Clooney [gc] (38 movies)

Catherine Z. Jones [czj] (22 movies)

Brad Pitt [bp]

(35 movies)

Ocean’s Eleven [o11] (13 actors)

Ocean’s Twelve [o12] (15 actors)

Semantic Vector Space Model (iii)

Ocean’s Eleven

STARRING George

Clooney [gc] (38 movies)

Catherine Z. Jones [czj] (22 movies)

Brad Pitt [bp]

(35 movies)

Ocean’s Eleven [o11] (13 actors)

Ocean’s Twelve [o12] (15 actors)

Ocean’s Twelve

xyxyx actormovieactormovieactor idftfw ,,

12 11 12 11 12 11

12 12 12 11 11

, , , , , ,

12 112 2 2 2 2

, , , , ,

( , )gc o gc o czj o czj o bp o bp o

starring

gc o czj o bp o gc o bp o

w w w w w wsim o o

w w w w w

Page 10: Movie Recommendation with DBpedia - IIR 2012

3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012

Semantic Vector Space Model (iv)

0.24235

49184log

13

1

0.21035

49184log

15

1

022

49184log0

0.22322

49184log

15

1

0.23938

49184log

13

1

0.20738

49184log

15

1

1111

1212

1111

1212

1111

1212

,,

,,

,,

,,

,,

,,

bpobpobp

bpobpobp

czjoczjoczj

czjoczjoczj

gcogcogc

gcogcogc

idftfw

idftfw

idftfw

idftfw

idftfw

idftfw12 11( , )starring starringsim o o

12 11( , )genre genresim o o

12 11( , )subject subjectsim o o

+

+

),( 1112 oosim

+

… =

Page 11: Movie Recommendation with DBpedia - IIR 2012

3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012

MORE: More than Movie Recommendation

http://apps.facebook.com/movie-recommendation/

MORE is a Facebook application that semantically recommends movies to the user leveraging the knowledge within DBpedia. MORE supports the user in exploratory browsing tasks by guiding their search through a semantic knowledge space. Similarities between movies are computed by a Semantic version of the classical Vector Space Model (sVSM), applied to semantic datasets.

Page 12: Movie Recommendation with DBpedia - IIR 2012

3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012

Semantic Content-based Recommender

Given a user profile, defined as:

( ) likes j jprofile u m u m

We compute a similarity between mi and the information encoded in profile(u):

( )

1( , )

( , )( )

j

p p j i

m profile u p

i

sim m mP

r u mprofile u

If this similarity is greater or equal to 0.5, we suggest the movie mi to the user u.

Page 13: Movie Recommendation with DBpedia - IIR 2012

3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012

Training the system

In order to identify the best possible values for the coefficients p (i.e., the weights associated to each property), we train the system via a genetic algorithm adopting an N-fold cross validation approach (with N = 5) on the 100k MovieLens dataset. At the end we obtain a set Ap = {p

1, …, p5} of 5 different values for each p, e.g.:

Then, we evaluate the performances with standard precision and recall tests, when p is one of the following:

min( )pA max( )pA ( )pavg A ( )pmedian A ( )plowestError A

Page 14: Movie Recommendation with DBpedia - IIR 2012

3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012

Evaluation: Precision & Recall

@@

Rec N TestSetP N

N

@@

Rec N TestSetR N

TestSet

The figure shows high values of Precision and Recall. The best values are obtained choosing the lowest misclassification error on Ap for the coefficients p.

3,4,5,6,7N

We also evaluated the importance of the subject/broader property. The information of this property is peculiar of ontological datasets. As shown in the figure, the performances drastically decrease if we do not consider this property.

Page 15: Movie Recommendation with DBpedia - IIR 2012

3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012

Conclusion & Future directions

The huge amount of data available on DBpedia can be successfully exploited to build content-based recommender systems.

We have presented MORE, a Facebook application that leverages the knowledge within DBpedia to produce movie recommendations by means of a semantic version of the classical vector space model (sVSM).

Evaluation against historical datasets and high values of precision and recall prove the validity of our approach.

We are currently working on: Testing the approach with different domains

Improving the recommendation with a hybrid approach (content-based and collaborative filtering)

We acknowledge partial support of HP IRP 2011. Grant CW267313.

Page 16: Movie Recommendation with DBpedia - IIR 2012

3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012

Q? A!