linked data-based concept recommendation: comparison of different methods in open innovation...

Post on 25-Jun-2015

1.015 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Concept recommendation is a widely used technique aimed to assist users to chose the right tags, improve their Web search experience and a multi- tude of other tasks. In finding potential problem solvers in Open Innovation (OI) scenarios, the concept recommendation is of a crucial importance as it can help to discover the right topics, directly or laterally related to an innovation problem. Such topics then could be used to identify relevant experts. We pro- pose two Linked Data-based concept recommendation methods for topic dis- covery. The first one, hyProximity, exploits only the particularities of Linked Data structures, while the other one applies a well-known Information Retrieval method, Random Indexing, to the linked data. We compare the two methods against the baseline in the gold standard-based and user study-based evalu- ations, using the real problems and solutions from an OI company.

TRANSCRIPT

Linked  Data-­‐‑based  Concept  Recommendation:  Comparison  of  Different  Methods  in  Open  

Innovation  Scenario Danica Damljanovic, Milan Stankovic,

Philippe Laublet

Innovation

Innovation  Platforms

Challenge:  Promote  innovation  problems  to  an  audience  of  solvers  who  can  propose  relevant  innovative  solutions

Finding  Meaningful  Connec0ons  

Clay  mining  …  

Kaolinite  extrac0on  from  

rocks  …  

Different  communi-es  use  different  terms  and  concepts  to  speak  about  seman-cally  related    things.  Such  “language”  defines  communi-es  and  separates  them.  Being  able  to  find  

meaningful  connec-ons  between  concepts  would  enable  us  to  build  bridges  between  people  and  content.  

h;p://bit.ly/hyProximity  

Concept  recommenda0on  •  Concepts  you  might  not  know  but  might  want  to  use:  to  annotate  

your  content,  to  search  for  content,  to  search  for  people…  •  Help  problem  promoters  discover  relevant  concepts  (problem  

promoters  some0mes  not  field  experts)  •  Discovery  =  relevance  +  unexpectedness  

h;p://bit.ly/hyProximity  

•  HyProximity, a structure-based similarity •  Structure-based Statistical Semantics Similarity

Random Indexing, a well-known statistical semantics from Information Retrieval to RDF

Discovering  Direct  and  Lateral  Concepts  

Linked  Data-­‐based  Concept  Recommenda0on    

Zemanta Textual  Input

DBPedia  Concepts  found  in  the  text

DBPedia  Exploration suggestions

h;p://bit.ly/hyProximity  

hyProximity  

•  We  start  from  several  seed  concepts  found  directly  in  the  text,  and  search  the  DBPedia  graph  

•  The  concepts  found  in  the  proximity  of  several  seed  concepts  are  considered  more  “in  context”  for  the  given  input  

•  Concepts  found  at  a  shorter  distance  from  the  seed  concepts  have  higher  hyProximity  

•  Hierarchical:  exploring  skos:broader  rela9ons  •  Transversal:  exploring  transversal  links  •  mixed:  a  linear  combina0on  of  hierarchical  and  transversal    

Different  Distance  Func0ons  skos:broader  

other  property  

2   2   2   2+1  

research.hypios.com/hyproximity  

Paris Seine

Rivers in France Cities in France

Things in France

Products of France

Marne Chanel

Car Industry

BMW Peugeot

Different  Distance  Func0ons  

“fashion”  1   1  

research.hypios.com/hyproximity  

1  

Paris Seine

Rivers in France Cities in France

Things in France

Products of France

Marne

Car Industry

BMW Peugeot Chanel

flows through competitor

skos:broader  

other  property  

famous for

•  Hierarchical:  exploring  skos:broader  rela0ons  •  Transversal:  exploring  transversal  links  •  Mixed:  a  linear  combina0on  of  hierarchical  and  transversal    

Random  Indexing •  Words which appear in the similar context - with the

same set of other words - are contextually related e.g. synonyms.

•  Synonyms tend not to co-occur with one another directly, so indirect inference is required to draw associations between words used to express the same idea

Two  steps  to  Random  Indexing

•  Indexing o  For an RDF graph, generate virtual documents o  Prepare the corpus (pre-processing) o Generate semantic index

•  Search - given a term X calculate a cosine similarity between the vector of that term and other vectors in the semantic space

Building  context    vectors

d1 0 0 -­‐‑1 1 -­‐‑1 1

d2 -­‐‑1 1 0 0 1 -­‐‑1

… dp 0 1 0 -­‐‑1 -­‐‑1 1

d1 d2 .. dp t1 1 2 .. 0

t2 3 0 .. 0

.. .. .. .. ..

tq 0 1 10

t1 t2 … tq

X =

Dimensionality  =  n

Seed  length

M

D

T

Indexing:  virtual  documents

14

S

O2

O1

L7

P7

L3

L2

L1

P4

L4

P1

P2

P3

L8

L6

L5

P10 P9 P8

lexicalise

Representative  subgraph  for  URI=S Virtual  document  for  URI=S

P5 P6

P1 S P2 L2 S L1

S P3 L3

S

L5

P4 P5 L4 O1 S P4 O1 P6 S L6 S

L8

P7

P7 P9 O2

L7 P8

O2 S P7 O2 P10

S P7 O2 S P4 O1

Experiments •  26 real innovation problems from Hypios •  Measure of success: the suggested concepts

appear in the actual solutions (precision, recall, f-measure)

(+) reasonable list of concepts from real scenarios (-) not complete:

o  User study: measure discovery = relevance+unexpectedness

DBpedia  Dataset •  Select a number of properties relevant to the Open

Innovation-related scenario •  dbo:product, dbp:pruducts, dbo:industry,

dbo:service, dbo:genre, and properties serving to establish a hierarchical categorization of con- cepts, namely dc:subject and skos:broader

Evaluation •  “Gold standard”

o  Extract problem URIs o  Extract solution URIs

•  Baseline: o Google Adwords Keyword Tool: finds similar

topics based on their distribution in textual corpora and the corpora of search queries.

o  Suggesting up to 600 concepts which are then used for Web crawling for finding experts.

Evaluation:  Results

! !

!!

User  Study •  Suggestions being both relevant and unexpected

o  the most valuable discoveries for the user •  12 users •  34 problem evaluations

o  3060 suggested concepts/keywords.

•  For the chosen innovation problem, the evaluators were presented with the lists of 30 top-ranked suggestions generated by adWords, hyProximity (mixed approach) and Random Indexing.

Example

User  Study:  Results

Conclusion •  Linked Data valuable source of knowledge for

concept recommendation •  Our two methods complementary

o  hyProximity better for precision o  Random Indexing better for recall

•  User study: unexpectedness higher with our methods than with baseline

•  Subjective user comment: o  Random Indexing: generic o  hyProximity: granular o adWords: redundant

Thank  You! •  Find out more: •  http://research.hypios.com/?page_id=165

Contact us: •  Danica Damljanovic @dancheeee •  Milan Stankovic: @milstan

top related