linked data-based concept recommendation: comparison of different methods in open innovation...

23
Linked Databased Concept Recommendation: Comparison of Different Methods in Open Innovation Scenario Danica Damljanovic, Milan Stankovic, Philippe Laublet

Upload: danica-damljanovic

Post on 25-Jun-2015

1.015 views

Category:

Technology


1 download

DESCRIPTION

Concept recommendation is a widely used technique aimed to assist users to chose the right tags, improve their Web search experience and a multi- tude of other tasks. In finding potential problem solvers in Open Innovation (OI) scenarios, the concept recommendation is of a crucial importance as it can help to discover the right topics, directly or laterally related to an innovation problem. Such topics then could be used to identify relevant experts. We pro- pose two Linked Data-based concept recommendation methods for topic dis- covery. The first one, hyProximity, exploits only the particularities of Linked Data structures, while the other one applies a well-known Information Retrieval method, Random Indexing, to the linked data. We compare the two methods against the baseline in the gold standard-based and user study-based evalu- ations, using the real problems and solutions from an OI company.

TRANSCRIPT

Page 1: Linked Data-based Concept Recommendation: Comparison of Different Methods in Open Innovation Scenario

Linked  Data-­‐‑based  Concept  Recommendation:  Comparison  of  Different  Methods  in  Open  

Innovation  Scenario Danica Damljanovic, Milan Stankovic,

Philippe Laublet

Page 2: Linked Data-based Concept Recommendation: Comparison of Different Methods in Open Innovation Scenario

Innovation

Page 3: Linked Data-based Concept Recommendation: Comparison of Different Methods in Open Innovation Scenario

Innovation  Platforms

Challenge:  Promote  innovation  problems  to  an  audience  of  solvers  who  can  propose  relevant  innovative  solutions

Page 4: Linked Data-based Concept Recommendation: Comparison of Different Methods in Open Innovation Scenario

Finding  Meaningful  Connec0ons  

Clay  mining  …  

Kaolinite  extrac0on  from  

rocks  …  

Different  communi-es  use  different  terms  and  concepts  to  speak  about  seman-cally  related    things.  Such  “language”  defines  communi-es  and  separates  them.  Being  able  to  find  

meaningful  connec-ons  between  concepts  would  enable  us  to  build  bridges  between  people  and  content.  

h;p://bit.ly/hyProximity  

Page 5: Linked Data-based Concept Recommendation: Comparison of Different Methods in Open Innovation Scenario

Concept  recommenda0on  •  Concepts  you  might  not  know  but  might  want  to  use:  to  annotate  

your  content,  to  search  for  content,  to  search  for  people…  •  Help  problem  promoters  discover  relevant  concepts  (problem  

promoters  some0mes  not  field  experts)  •  Discovery  =  relevance  +  unexpectedness  

h;p://bit.ly/hyProximity  

Page 6: Linked Data-based Concept Recommendation: Comparison of Different Methods in Open Innovation Scenario

•  HyProximity, a structure-based similarity •  Structure-based Statistical Semantics Similarity

Random Indexing, a well-known statistical semantics from Information Retrieval to RDF

Discovering  Direct  and  Lateral  Concepts  

Page 7: Linked Data-based Concept Recommendation: Comparison of Different Methods in Open Innovation Scenario

Linked  Data-­‐based  Concept  Recommenda0on    

Zemanta Textual  Input

DBPedia  Concepts  found  in  the  text

DBPedia  Exploration suggestions

h;p://bit.ly/hyProximity  

Page 8: Linked Data-based Concept Recommendation: Comparison of Different Methods in Open Innovation Scenario

hyProximity  

•  We  start  from  several  seed  concepts  found  directly  in  the  text,  and  search  the  DBPedia  graph  

•  The  concepts  found  in  the  proximity  of  several  seed  concepts  are  considered  more  “in  context”  for  the  given  input  

•  Concepts  found  at  a  shorter  distance  from  the  seed  concepts  have  higher  hyProximity  

Page 9: Linked Data-based Concept Recommendation: Comparison of Different Methods in Open Innovation Scenario

•  Hierarchical:  exploring  skos:broader  rela9ons  •  Transversal:  exploring  transversal  links  •  mixed:  a  linear  combina0on  of  hierarchical  and  transversal    

Different  Distance  Func0ons  skos:broader  

other  property  

2   2   2   2+1  

research.hypios.com/hyproximity  

Paris Seine

Rivers in France Cities in France

Things in France

Products of France

Marne Chanel

Car Industry

BMW Peugeot

Page 10: Linked Data-based Concept Recommendation: Comparison of Different Methods in Open Innovation Scenario

Different  Distance  Func0ons  

“fashion”  1   1  

research.hypios.com/hyproximity  

1  

Paris Seine

Rivers in France Cities in France

Things in France

Products of France

Marne

Car Industry

BMW Peugeot Chanel

flows through competitor

skos:broader  

other  property  

famous for

•  Hierarchical:  exploring  skos:broader  rela0ons  •  Transversal:  exploring  transversal  links  •  Mixed:  a  linear  combina0on  of  hierarchical  and  transversal    

Page 11: Linked Data-based Concept Recommendation: Comparison of Different Methods in Open Innovation Scenario

Random  Indexing •  Words which appear in the similar context - with the

same set of other words - are contextually related e.g. synonyms.

•  Synonyms tend not to co-occur with one another directly, so indirect inference is required to draw associations between words used to express the same idea

Page 12: Linked Data-based Concept Recommendation: Comparison of Different Methods in Open Innovation Scenario

Two  steps  to  Random  Indexing

•  Indexing o  For an RDF graph, generate virtual documents o  Prepare the corpus (pre-processing) o Generate semantic index

•  Search - given a term X calculate a cosine similarity between the vector of that term and other vectors in the semantic space

Page 13: Linked Data-based Concept Recommendation: Comparison of Different Methods in Open Innovation Scenario

Building  context    vectors

d1 0 0 -­‐‑1 1 -­‐‑1 1

d2 -­‐‑1 1 0 0 1 -­‐‑1

… dp 0 1 0 -­‐‑1 -­‐‑1 1

d1 d2 .. dp t1 1 2 .. 0

t2 3 0 .. 0

.. .. .. .. ..

tq 0 1 10

t1 t2 … tq

X =

Dimensionality  =  n

Seed  length

M

D

T

Page 14: Linked Data-based Concept Recommendation: Comparison of Different Methods in Open Innovation Scenario

Indexing:  virtual  documents

14

S

O2

O1

L7

P7

L3

L2

L1

P4

L4

P1

P2

P3

L8

L6

L5

P10 P9 P8

lexicalise

Representative  subgraph  for  URI=S Virtual  document  for  URI=S

P5 P6

P1 S P2 L2 S L1

S P3 L3

S

L5

P4 P5 L4 O1 S P4 O1 P6 S L6 S

L8

P7

P7 P9 O2

L7 P8

O2 S P7 O2 P10

S P7 O2 S P4 O1

Page 15: Linked Data-based Concept Recommendation: Comparison of Different Methods in Open Innovation Scenario

Experiments •  26 real innovation problems from Hypios •  Measure of success: the suggested concepts

appear in the actual solutions (precision, recall, f-measure)

(+) reasonable list of concepts from real scenarios (-) not complete:

o  User study: measure discovery = relevance+unexpectedness

Page 16: Linked Data-based Concept Recommendation: Comparison of Different Methods in Open Innovation Scenario

DBpedia  Dataset •  Select a number of properties relevant to the Open

Innovation-related scenario •  dbo:product, dbp:pruducts, dbo:industry,

dbo:service, dbo:genre, and properties serving to establish a hierarchical categorization of con- cepts, namely dc:subject and skos:broader

Page 17: Linked Data-based Concept Recommendation: Comparison of Different Methods in Open Innovation Scenario

Evaluation •  “Gold standard”

o  Extract problem URIs o  Extract solution URIs

•  Baseline: o Google Adwords Keyword Tool: finds similar

topics based on their distribution in textual corpora and the corpora of search queries.

o  Suggesting up to 600 concepts which are then used for Web crawling for finding experts.

Page 18: Linked Data-based Concept Recommendation: Comparison of Different Methods in Open Innovation Scenario

Evaluation:  Results

! !

!!

Page 19: Linked Data-based Concept Recommendation: Comparison of Different Methods in Open Innovation Scenario

User  Study •  Suggestions being both relevant and unexpected

o  the most valuable discoveries for the user •  12 users •  34 problem evaluations

o  3060 suggested concepts/keywords.

•  For the chosen innovation problem, the evaluators were presented with the lists of 30 top-ranked suggestions generated by adWords, hyProximity (mixed approach) and Random Indexing.

Page 20: Linked Data-based Concept Recommendation: Comparison of Different Methods in Open Innovation Scenario

Example

Page 21: Linked Data-based Concept Recommendation: Comparison of Different Methods in Open Innovation Scenario

User  Study:  Results

Page 22: Linked Data-based Concept Recommendation: Comparison of Different Methods in Open Innovation Scenario

Conclusion •  Linked Data valuable source of knowledge for

concept recommendation •  Our two methods complementary

o  hyProximity better for precision o  Random Indexing better for recall

•  User study: unexpectedness higher with our methods than with baseline

•  Subjective user comment: o  Random Indexing: generic o  hyProximity: granular o adWords: redundant

Page 23: Linked Data-based Concept Recommendation: Comparison of Different Methods in Open Innovation Scenario

Thank  You! •  Find out more: •  http://research.hypios.com/?page_id=165

Contact us: •  Danica Damljanovic @dancheeee •  Milan Stankovic: @milstan