is this entitity relevant to your needs - cikm2012

40
IBM Research - Haifa © 2012 IBM Corporation Is This Entity Relevant to Your Needs? David Carmel IBM Research - Haifa, Israel

Upload: david-carmel

Post on 29-Nov-2014

1.483 views

Category:

Documents


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa © 2012 IBM Corporation

Is This Entity Relevant to Your Needs?

David Carmel

IBM Research - Haifa, Israel

Page 2: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation2 Is This entity Relevant?

Outline

� Some Open Questions in Entity Oriented Search (EoS)

� What makes an entity relevant to the user needs?

� Is it the same relevance that the IR community deals with

� Can we adopt exiting IR models into this new area

� The classical model of relevance in IR

� User based relevance

� Topical based relevance (Aboutness)

� Similarity based relevance measurements

� Supportive evidence as indication of relevancy

� For Q&A

� For EoS

� Relevance Estimation approaches for EOS

� Exploration & Discovery in EoS

� Summary

Page 3: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation3 Is This entity Relevant?

Entity Oriented Search (EoS)

� When people use retrieval systems they are often not searching for documents or text passages

� Often named entities play a central role in answering such information needs� persons, organizations, locations, products…

� At least 20-30% of the queries submitted to Web SE are simply name entities

� ~71% of Web search queries contain named entities

(Named entity recognition in query, Guo et al, SIGIR09)

Page 4: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation4 Is This entity Relevant?

Popular Entity Oriented Search tools

� Product Search

� On-line Shopping (books, movies, electronic devices…)

� Amazon, eBay…

� Travel (places, hotels, flights…)

� Yahoo! Travel, Kayak…

� Multi-media (Music, Video, Images…)

� Last.fm, YouYube, Flickr…

� People Search

� Expert Search (for a specific topic)

� LinkedIn, ArnetMiner…

� Friends (colleagues, other people with mutual interests, lost friends …)

� Facebook…

� Location Search

� Addresses

� Businesses

� Proximity Search (Find close sites to the current searcher’slocation)

Page 5: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation5 Is This entity Relevant?

Page 6: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation6 Is This entity Relevant?

Expert Search

� How knowledgeable can be measured?

� How persons should be ranked, in response to a query, such that those with relevant expertise are ranked first?

The task:

� Identify people who are knowledgeable on a specific topic

� Find people who have skills and experienceon a given topic

Page 7: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation7 Is This entity Relevant?

Are those entities satisfy our needs?

� What makes an entity relevant to the user’s need?

� What is the meaning of relevance in this context?

� Is it the same relevance that the IR community deals with for many

decades in the context of document retrieval?

� Can we adopt exiting IR models into this new area of Entity oriented Search in a straight forward manner?

� In this talk I’ll try to deal with some of those questions

� I’ll overview how the same questions are handled in related

areas, (especially in Q&A)

� I’ll raise some research directions that might lead to a better

understanding of the concept of relevance in EoS

Page 8: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation8 Is This entity Relevant?

What is an Entity?

� Entity: an object or a “thing” that can be uniquely identified in the world

� An entity must be distinguished from other entities

� Can be anything (including an abstract thing!)

� Attributes: Used to describe entities

� An attribute contains a single piece of information

� Key - A minimal set of attributes that uniquely identify an entity

� Entity set: a set of entities of the same type and attributes

Actorid

name address

birthday

Page 9: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation9 Is This entity Relevant?

What is a Relationship?

� Relationship: Association among two or more entities

� A Relationship also may have attributes

� Relationship Set: Set of relationships of the same type

Patient

id

name

Prescription Physician id

Medicationcode name

Date

Page 10: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation10 Is This entity Relevant?

Example: ERD for Social Search in the Enterprise

Creator

Page 11: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation11 Is This entity Relevant?

Entity Relationship Graph (ERG)

Represents

�Entity instances as graph nodes

�Binary relationships as (weighted) edges

�N-ary relations are broken into binary ones

Page 12: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation12 Is This entity Relevant?

Entity Oriented Search (EoS)

Entity Relationship Data

Entity Relationship

Index

Entities, Relations

Runtime

RankingNavigationExploration

Query (Free Text, Entity, Hybrid query)

Related Entities, Relationships

Query Examples:• Nikon D40• Teammates of Michael Schumacher• “Data mining”

Page 13: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa © 2012 IBM Corporation

The concept of Relevance in IR

Page 14: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation14 Is This entity Relevant?

The Classical Concept of Relevance in IR (Saracevic76, Mizzaro96)

P: The user has

problem to solve

or an aim to

achieve

Problem

Information Need

Request

Query

IN: The user builds

mental, implicit

representation of P

(may be incorrect or

Incomplete)

R: The user expresses

IN explicitly, usually

In natural language,

(sometimes with the

help of an intermediary)

Q: Formalization: R is

translated to a formal

query understandable by

the search system

Judgment

J: The same user

Judges the

RELEVANCE

of search results

Page 15: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation15 Is This entity Relevant?

User-based (Subjective) Relevance

� Relevance is a dynamic concept that depends on the user’s subjective judgment

� Subjective Relevance judgment may depend on:

� User’s characteristics and perceptions

� Gender, age, education, income, occupation…

� Preferences, Interests,

� State of mind

� The context of search

� Level of the user’s expertise (regarding the topic of interests)

� Current Time

� Current Location

� Session status

� Dependencies between retrieved items to the

• specific query

• sequential queries during the session

Page 16: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation16 Is This entity Relevant?

Topical-based relevance judgment

� How well the topic of the information retrieved matches the topic of the request

� An object is objectively relevant to a request if it deals with the topic of the request (Aboutness)

� TREC working definition for relevance assessment:

If you are writing a report on the topic and would use the information contained in the document in the report –

then the document is considered relevant to the topic…

� A document is judged relevant if any piece of it is relevant regardless of how small that piece is in relation to the rest of the document

Page 17: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation17 Is This entity Relevant?

Probability Ranking Principal

� Given a set of documents that “match” the entity-oriented query

�How do we rank them for the user?

),|1Pr( qdR = ),|1Pr( qeR =

We need a reliable and coherent methodology for measuring the probability of relevance of an entity to a query

The Probability Ranking Principal (PRP) for Document Retrieval (Robertson 71):

� ``If a retrieval system's response to each request is a ranking of the documentsin the collection in order of decreasing probability of relevance to the user who submitted the request,

�where the probabilities are estimated as accurately as possible on the basis of whatever data have been made available to the system for this purpose,

� The overall effectiveness of the system to its user will be the best…''

Page 18: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation18 Is This entity Relevant?

Relevance estimation in classic Document Retrieval

� Most relevance approximation approaches for document retrieval are based on measuring some kind of similarity between the user's query and retrieved documents

� Vector Space:

� The Cosine of the angle between two vectors

� Concept space:

� similarity in the latent concept space

• e.g. LDA, LSI, ESA

� Language models:

� Similarity between the

documents and the query term distributions

Can we use similar approaches for EoS?

Page 19: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation19 Is This entity Relevant?

Entity Similarity

� While similarity plays a central role in document retrieval for relevance estimation many relevant entities are not similar to the queried entity

� At least according to standard definitions of similarity

� This problem is well known in the Question Answering domain

� The answer is not necessarily “similar” to the question

� The supportive passage is not always similar to the question

� Example: Who killed JFK?

John F. Kennedy (JFK), the thirty-fifth President of the United States, was

assassinated at 12:30 p.m. Central Standard Time (18:30 UTC) on Friday,

November 22, 1963, in Dealey Plaza, Dallas, Texas.

The ten-month investigation of the Warren Commission of 1963–1964 concluded that the President was assassinated by Lee Harvey Oswald.

Page 20: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation20 Is This entity Relevant?

Relevance Judgment in Question Answering

� In QA we usually assume a question that identifies the information need “precisely”� Who was the first American in space?

� How many calories are there in a Big Mac?

� How many Grand Slam titles did Bjorn Borg win?

� When an answer will be considered relevant to the question? � It must be correct!

� i.e. it Must has supportive evidences (from reliable sources)

A prominent factor in answering a question is not so much in finding an answer but in validating whether the candidate answer is correct

� Therefore supportive evidence is essential

� Assessment instructions from the TREC’s QA track:

Assessors read each candidate answer and make a binary

decision as to whether or not the candidate is actually an

answer to the question

in the context provided by the supportive document

Page 21: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation21 Is This entity Relevant?

What do you mean the answer is correct?

As in Document retrieval – correctness/relevance in QA might be subjective and user dependent

Where is the Taj Mahal?� Agra, India? The famous temple

� Atlantic-City, NJ? Casino?

In TREC, it is common to consider each candidate answer with (relevant) supportive evidences as correct one

� This leads to the understanding how various candidate answers can be ranked:

� i.e. Relevance judgment is transformed to the judgment of the relevance of supporting evidences

� This approach can be applied to Entity oriented Search

� Rank retrieved entitles according to the amount and quality of their supportive evidences!

Entity Ranking should be based on the supportive evidences

for their relevance to the query

Page 22: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa © 2012 IBM Corporation

Relevance Estimation Approaches for EoS

Page 23: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation23 Is This entity Relevant?

The Expert Profile based Approach (Craswell et all 2001):

� Represent each person by a virtual document (a profile)

� Employee directory (in the enterprise)

� Concatenating all existing passages mentioning the person

� Rank those profiles according to their relevance to the query

� Using standard IR ranking techniques

� The user profile can be naturally used as supportive evidence to the user

expertise

� Difficulties:

� Co-resolution and name disambiguation

� Privacy concerns

Page 24: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation24 Is This entity Relevant?

EoS: Voting approach (Balog06, MacDonald09)

� Any relevant document is a “voter” for the entities it mentions / relates-to

� What is the ratio behind?� An entity mentioned many times in relevant (top retrieved) docs

is more likely to be relevant on the given topic?

∑ ∗=d

dpScoreqdScoreqpScore ),(),(),(

d1

d2

d3

p1

p2

p3

q

Page 25: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation25 Is This entity Relevant?

Relevance Propagation (Serdyukov 2008)

� We should also consider entities that are indirectly related to the query

�Relevance is propagated through the entity relationship graph

d1

d2

d3

p1

p2

p3

q p4

d4

How relevance should be propagated in the graph?

Page 26: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation26 Is This entity Relevant?

Popular Random Walk Approaches

� SimRank(u,v):How soon two random surfers (starting at u,v) are

expected to meet at the same node

� Random walk with Restart (RWR) :The surfer has a fixed restart probability to return to

the source

� Lazy Random WalkThe surfer has a fixed probability of halting the walk at

each step

� Effective ConductanceOnly simple (cycle free) paths –treating edges as resistors

Proximity in the Entity Relationship Graph - Random walks

� Random walk approach

The relationship strength between two

nodes is reflected by the probability that a

random surfer who starts at one node will

visit the second one during the walk

� Justification

� The more paths that connect the two

entities in the graph �

� the higher the probability that the surfer will visit the target entity�

�The higher the relationship strength

between the two

Page 27: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation27 Is This entity Relevant?

Markov Random Fields for EoS (Raviv, Carmel, Kurland, 2012)

{ , , }

P( | ) P( | )PE P

P D T N

E Q E Qλ∈∑�

1{ ... },nQ q q T=< >

Page 28: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation28 Is This entity Relevant?

MRF based Entity Document Scoring P(ED|Q)

� We consider three types cliques

� Full Independent

� Sequential dependent

� Full dependent

� The feature function over cliques

�measures how well the clique's terms represent the entity document

�Based on Dirichlet smoothed language model

�For dependent models we replace qi with #1(qi..qi+k) and #uwN({qi,.. qj}) respectively

� The entity document scoring function aggregates the feature functions over all clique types

{ , , }

P( | ) ( )D

ED

I I

D E D

I T O U c I

E Q f cλ∈ ∈∑ ∑�

( , ) ( )/ | |( , ) log

| |

T i D iD i D

D

tf q E cf q Cf q E

E

µµ

+ ⋅ +

Page 29: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation29 Is This entity Relevant?

Entity type Scoring P(ET|Q)

� We measure the “similarity” between the query type and the entity type

( , )

( , ' )

'

( | ) ( ) logT T

T T

d Q E

T T d Q E

E R

eP E Q f c

e

α

α

=

∑�

� d(QT,ET) - the type distance, is domain dependent

� In our experiments we measured the distance in the

Wikipedia category graph

� The minimal path length between all pairs of the query and the entity’s page categories

Page 30: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation30 Is This entity Relevant?

Entity Name Scoring P(EN|Q)

� We measure the dependency between the query term(s) and the entity name

�Globally

�Measure the proximity between the query term(s) and the entity name in the whole collection

• We use pointwise mutual information (PMI) – the likelihood of finding one term in proximity to another term

�Locally

�Measure the proximity between the query terms and the entity name in the top retrieved documents

{ , , , , , , }

P( | ) ( )N

EN

T O U

X X

N E N

X A c X

A S T O U PMI PMI PMI

E Q f cλ∈ ∈

=

=∑ ∑

Page 31: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation31 Is This entity Relevant?

Experimental Results over INEX Entity track (2007-2009)

Full Independence

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

S(ED) S(ED,ET) S(ED,ET,EN) INEX top

2007

2008

2009

Sequential dependence

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

S(ED) S(ED,ET) S(ED,ET,EN) INEX top

2007

2008

2009

Full dependence

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

S(ED) S(ED,ET) S(ED,ET,EN) INEX top

2007

2008

2009

� Results are improved significantly

when type and name scoring were

added

� Final Results are superior to top INEX

results at 2007,2008, and comparable

to 2009

� Dependence models have not improved over Independence model??

Page 32: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation32 Is This entity Relevant?

Exploratory EoS

� When only an entity is given as input, the information need is quite fuzzy

� Any related entity has a potential to be relevant

� Therefore any related entity should be retrieved!

� High diversity in search results (entity types, relationship types)

� How can we ease the user to find the most relevant answers?

� Iterative IR – let the user navigate and explore the ER graph

� Facet search:

� Categorize the search results according to their facets (entity types/attributes..)

� Let the user drill down: restrict retrieved entities to a specific facet

� NOTE: We still need to rank the search results in each of the facets!

� Graph navigation:

� Let the user explore the graph by using a retrieved entity as a pivot to a new search

� Query reformulation

Page 33: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation33 Is This entity Relevant?

Search over Social Media Data (SaND) – (Carmel 2009, Guy 2010)

� SaND provides social aggregation over social data

� SaND builds an entity-entity relationship matrix that maps a given entity to all related entities, weighted by their relationship strength

�Direct relations of a user to:

�document – as an author, tagger and commenter

�another user – as a friend or as a manager/employee

�tag – she used, or tagged by others

�group –as a member/owner

�Indirect relations:

�Two entities are indirectly related if both are directly related to the same entity

� The overall relationship strength between two entities is determined by a linear combination of their direct and indirect relationship strengths

Page 34: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation34 Is This entity Relevant?

Results contain different types of

entities – Blogs, Communities,

bookmarked documents etc..

Popular, higher ranked results

appear higher in the result set.

Search for the term ‘social’

Related People – Ranked list of

people that are related to the topic

and to the result set, in one or more

relationship types (author,

commenter, tagger, etc.)

Related Tags – Ranked tag cloud for

this result set.

Page 35: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation35 Is This entity Relevant?

Hovering over a result, highlights

the related people and tags

Narrowing the search to Luis

Suarez’ related results

Page 36: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation36 Is This entity Relevant?

Viewing results for query ‘social’

and person ‘Luis Suarez’

Viewing Luis’ business card, and

results related to him

Page 37: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation37 Is This entity Relevant?

Summary

� In this talk we raised several questions related to the concept of relevance in EoS:

� What makes an entity relevant to the user’s need?

� What is the meaning of relevance in this context?

� Is it the same notion of relevance used in document retrieval?

� We argue that the relevance of an entity can be estimated, according to supportive evidences provided by the search system

� We talked on EoS common retrieval techniques:

� Profile based approach

� The Voting approach

� Relevance propagation

� We discussed several examples of EoS systems and how relevance estimation can be applied in these domains

� We claimed that the scale and diversity of EoS search results demand Exploratory search techniques such as Facet search and Graph navigation

Page 38: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation38 Is This entity Relevant?

Open Questions and Challenges

� Entity Similarity

� While in document retrieval similarity plays a central role in relevance judgment, entity similarity measurement should still be better understood

� Attribute based similarity, Evidence based similarity

� Graph proximity

� Hybrid approaches

� The clustering hypothesis:

� Are two “similar” entities likely being relevant to the same information need?

� Challenges

� to what extent relevant entities are indeed similar to each other

� and according to which similarity measurement

� Relevance propagation: What relationship types provide effective relevance propagation channels?

� Do your friends inherit your own expertise?

� Which relationship types contribute to relevance propagation?

Page 39: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa

© 2012 IBM Corporation39 Is This entity Relevant?

Thank You!

Questions?

Page 40: Is this Entitity Relevant to your Needs - CIKM2012

IBM Research - Haifa © 2012 IBM Corporation

Is This Entity Relevant to Your Needs?

David Carmel

IBM Research - Haifa, Israel