is this entitity relevant to your needs - cikm2012
DESCRIPTION
TRANSCRIPT
IBM Research - Haifa © 2012 IBM Corporation
Is This Entity Relevant to Your Needs?
David Carmel
IBM Research - Haifa, Israel
IBM Research - Haifa
© 2012 IBM Corporation2 Is This entity Relevant?
Outline
� Some Open Questions in Entity Oriented Search (EoS)
� What makes an entity relevant to the user needs?
� Is it the same relevance that the IR community deals with
� Can we adopt exiting IR models into this new area
� The classical model of relevance in IR
� User based relevance
� Topical based relevance (Aboutness)
� Similarity based relevance measurements
� Supportive evidence as indication of relevancy
� For Q&A
� For EoS
� Relevance Estimation approaches for EOS
� Exploration & Discovery in EoS
� Summary
IBM Research - Haifa
© 2012 IBM Corporation3 Is This entity Relevant?
Entity Oriented Search (EoS)
� When people use retrieval systems they are often not searching for documents or text passages
� Often named entities play a central role in answering such information needs� persons, organizations, locations, products…
� At least 20-30% of the queries submitted to Web SE are simply name entities
� ~71% of Web search queries contain named entities
(Named entity recognition in query, Guo et al, SIGIR09)
IBM Research - Haifa
© 2012 IBM Corporation4 Is This entity Relevant?
Popular Entity Oriented Search tools
� Product Search
� On-line Shopping (books, movies, electronic devices…)
� Amazon, eBay…
� Travel (places, hotels, flights…)
� Yahoo! Travel, Kayak…
� Multi-media (Music, Video, Images…)
� Last.fm, YouYube, Flickr…
� People Search
� Expert Search (for a specific topic)
� LinkedIn, ArnetMiner…
� Friends (colleagues, other people with mutual interests, lost friends …)
� Facebook…
� Location Search
� Addresses
� Businesses
� Proximity Search (Find close sites to the current searcher’slocation)
IBM Research - Haifa
© 2012 IBM Corporation5 Is This entity Relevant?
IBM Research - Haifa
© 2012 IBM Corporation6 Is This entity Relevant?
Expert Search
� How knowledgeable can be measured?
� How persons should be ranked, in response to a query, such that those with relevant expertise are ranked first?
The task:
� Identify people who are knowledgeable on a specific topic
� Find people who have skills and experienceon a given topic
IBM Research - Haifa
© 2012 IBM Corporation7 Is This entity Relevant?
Are those entities satisfy our needs?
� What makes an entity relevant to the user’s need?
� What is the meaning of relevance in this context?
� Is it the same relevance that the IR community deals with for many
decades in the context of document retrieval?
� Can we adopt exiting IR models into this new area of Entity oriented Search in a straight forward manner?
� In this talk I’ll try to deal with some of those questions
� I’ll overview how the same questions are handled in related
areas, (especially in Q&A)
� I’ll raise some research directions that might lead to a better
understanding of the concept of relevance in EoS
IBM Research - Haifa
© 2012 IBM Corporation8 Is This entity Relevant?
What is an Entity?
� Entity: an object or a “thing” that can be uniquely identified in the world
� An entity must be distinguished from other entities
� Can be anything (including an abstract thing!)
� Attributes: Used to describe entities
� An attribute contains a single piece of information
� Key - A minimal set of attributes that uniquely identify an entity
� Entity set: a set of entities of the same type and attributes
Actorid
name address
birthday
IBM Research - Haifa
© 2012 IBM Corporation9 Is This entity Relevant?
What is a Relationship?
� Relationship: Association among two or more entities
� A Relationship also may have attributes
� Relationship Set: Set of relationships of the same type
Patient
id
name
Prescription Physician id
Medicationcode name
Date
IBM Research - Haifa
© 2012 IBM Corporation10 Is This entity Relevant?
Example: ERD for Social Search in the Enterprise
Creator
IBM Research - Haifa
© 2012 IBM Corporation11 Is This entity Relevant?
Entity Relationship Graph (ERG)
Represents
�Entity instances as graph nodes
�Binary relationships as (weighted) edges
�N-ary relations are broken into binary ones
IBM Research - Haifa
© 2012 IBM Corporation12 Is This entity Relevant?
Entity Oriented Search (EoS)
Entity Relationship Data
Entity Relationship
Index
Entities, Relations
Runtime
RankingNavigationExploration
Query (Free Text, Entity, Hybrid query)
Related Entities, Relationships
Query Examples:• Nikon D40• Teammates of Michael Schumacher• “Data mining”
IBM Research - Haifa © 2012 IBM Corporation
The concept of Relevance in IR
IBM Research - Haifa
© 2012 IBM Corporation14 Is This entity Relevant?
The Classical Concept of Relevance in IR (Saracevic76, Mizzaro96)
P: The user has
problem to solve
or an aim to
achieve
Problem
Information Need
Request
Query
IN: The user builds
mental, implicit
representation of P
(may be incorrect or
Incomplete)
R: The user expresses
IN explicitly, usually
In natural language,
(sometimes with the
help of an intermediary)
Q: Formalization: R is
translated to a formal
query understandable by
the search system
Judgment
J: The same user
Judges the
RELEVANCE
of search results
IBM Research - Haifa
© 2012 IBM Corporation15 Is This entity Relevant?
User-based (Subjective) Relevance
� Relevance is a dynamic concept that depends on the user’s subjective judgment
� Subjective Relevance judgment may depend on:
� User’s characteristics and perceptions
� Gender, age, education, income, occupation…
� Preferences, Interests,
� State of mind
� The context of search
� Level of the user’s expertise (regarding the topic of interests)
� Current Time
� Current Location
� Session status
� Dependencies between retrieved items to the
• specific query
• sequential queries during the session
IBM Research - Haifa
© 2012 IBM Corporation16 Is This entity Relevant?
Topical-based relevance judgment
� How well the topic of the information retrieved matches the topic of the request
� An object is objectively relevant to a request if it deals with the topic of the request (Aboutness)
� TREC working definition for relevance assessment:
If you are writing a report on the topic and would use the information contained in the document in the report –
then the document is considered relevant to the topic…
� A document is judged relevant if any piece of it is relevant regardless of how small that piece is in relation to the rest of the document
IBM Research - Haifa
© 2012 IBM Corporation17 Is This entity Relevant?
Probability Ranking Principal
� Given a set of documents that “match” the entity-oriented query
�How do we rank them for the user?
),|1Pr( qdR = ),|1Pr( qeR =
We need a reliable and coherent methodology for measuring the probability of relevance of an entity to a query
The Probability Ranking Principal (PRP) for Document Retrieval (Robertson 71):
� ``If a retrieval system's response to each request is a ranking of the documentsin the collection in order of decreasing probability of relevance to the user who submitted the request,
�where the probabilities are estimated as accurately as possible on the basis of whatever data have been made available to the system for this purpose,
� The overall effectiveness of the system to its user will be the best…''
IBM Research - Haifa
© 2012 IBM Corporation18 Is This entity Relevant?
Relevance estimation in classic Document Retrieval
� Most relevance approximation approaches for document retrieval are based on measuring some kind of similarity between the user's query and retrieved documents
� Vector Space:
� The Cosine of the angle between two vectors
� Concept space:
� similarity in the latent concept space
• e.g. LDA, LSI, ESA
� Language models:
� Similarity between the
documents and the query term distributions
Can we use similar approaches for EoS?
IBM Research - Haifa
© 2012 IBM Corporation19 Is This entity Relevant?
Entity Similarity
� While similarity plays a central role in document retrieval for relevance estimation many relevant entities are not similar to the queried entity
� At least according to standard definitions of similarity
� This problem is well known in the Question Answering domain
� The answer is not necessarily “similar” to the question
� The supportive passage is not always similar to the question
� Example: Who killed JFK?
John F. Kennedy (JFK), the thirty-fifth President of the United States, was
assassinated at 12:30 p.m. Central Standard Time (18:30 UTC) on Friday,
November 22, 1963, in Dealey Plaza, Dallas, Texas.
The ten-month investigation of the Warren Commission of 1963–1964 concluded that the President was assassinated by Lee Harvey Oswald.
IBM Research - Haifa
© 2012 IBM Corporation20 Is This entity Relevant?
Relevance Judgment in Question Answering
� In QA we usually assume a question that identifies the information need “precisely”� Who was the first American in space?
� How many calories are there in a Big Mac?
� How many Grand Slam titles did Bjorn Borg win?
� When an answer will be considered relevant to the question? � It must be correct!
� i.e. it Must has supportive evidences (from reliable sources)
A prominent factor in answering a question is not so much in finding an answer but in validating whether the candidate answer is correct
� Therefore supportive evidence is essential
� Assessment instructions from the TREC’s QA track:
Assessors read each candidate answer and make a binary
decision as to whether or not the candidate is actually an
answer to the question
in the context provided by the supportive document
IBM Research - Haifa
© 2012 IBM Corporation21 Is This entity Relevant?
What do you mean the answer is correct?
As in Document retrieval – correctness/relevance in QA might be subjective and user dependent
Where is the Taj Mahal?� Agra, India? The famous temple
� Atlantic-City, NJ? Casino?
In TREC, it is common to consider each candidate answer with (relevant) supportive evidences as correct one
� This leads to the understanding how various candidate answers can be ranked:
� i.e. Relevance judgment is transformed to the judgment of the relevance of supporting evidences
� This approach can be applied to Entity oriented Search
� Rank retrieved entitles according to the amount and quality of their supportive evidences!
Entity Ranking should be based on the supportive evidences
for their relevance to the query
IBM Research - Haifa © 2012 IBM Corporation
Relevance Estimation Approaches for EoS
IBM Research - Haifa
© 2012 IBM Corporation23 Is This entity Relevant?
The Expert Profile based Approach (Craswell et all 2001):
� Represent each person by a virtual document (a profile)
� Employee directory (in the enterprise)
� Concatenating all existing passages mentioning the person
� Rank those profiles according to their relevance to the query
� Using standard IR ranking techniques
� The user profile can be naturally used as supportive evidence to the user
expertise
� Difficulties:
� Co-resolution and name disambiguation
� Privacy concerns
IBM Research - Haifa
© 2012 IBM Corporation24 Is This entity Relevant?
EoS: Voting approach (Balog06, MacDonald09)
� Any relevant document is a “voter” for the entities it mentions / relates-to
� What is the ratio behind?� An entity mentioned many times in relevant (top retrieved) docs
is more likely to be relevant on the given topic?
∑ ∗=d
dpScoreqdScoreqpScore ),(),(),(
d1
d2
d3
p1
p2
p3
q
IBM Research - Haifa
© 2012 IBM Corporation25 Is This entity Relevant?
Relevance Propagation (Serdyukov 2008)
� We should also consider entities that are indirectly related to the query
�Relevance is propagated through the entity relationship graph
d1
d2
d3
p1
p2
p3
q p4
d4
How relevance should be propagated in the graph?
IBM Research - Haifa
© 2012 IBM Corporation26 Is This entity Relevant?
Popular Random Walk Approaches
� SimRank(u,v):How soon two random surfers (starting at u,v) are
expected to meet at the same node
� Random walk with Restart (RWR) :The surfer has a fixed restart probability to return to
the source
� Lazy Random WalkThe surfer has a fixed probability of halting the walk at
each step
� Effective ConductanceOnly simple (cycle free) paths –treating edges as resistors
Proximity in the Entity Relationship Graph - Random walks
� Random walk approach
The relationship strength between two
nodes is reflected by the probability that a
random surfer who starts at one node will
visit the second one during the walk
� Justification
� The more paths that connect the two
entities in the graph �
� the higher the probability that the surfer will visit the target entity�
�The higher the relationship strength
between the two
IBM Research - Haifa
© 2012 IBM Corporation27 Is This entity Relevant?
Markov Random Fields for EoS (Raviv, Carmel, Kurland, 2012)
{ , , }
P( | ) P( | )PE P
P D T N
E Q E Qλ∈∑�
1{ ... },nQ q q T=< >
IBM Research - Haifa
© 2012 IBM Corporation28 Is This entity Relevant?
MRF based Entity Document Scoring P(ED|Q)
� We consider three types cliques
� Full Independent
� Sequential dependent
� Full dependent
� The feature function over cliques
�measures how well the clique's terms represent the entity document
�Based on Dirichlet smoothed language model
�For dependent models we replace qi with #1(qi..qi+k) and #uwN({qi,.. qj}) respectively
� The entity document scoring function aggregates the feature functions over all clique types
{ , , }
P( | ) ( )D
ED
I I
D E D
I T O U c I
E Q f cλ∈ ∈∑ ∑�
( , ) ( )/ | |( , ) log
| |
T i D iD i D
D
tf q E cf q Cf q E
E
µµ
+ ⋅ +
�
IBM Research - Haifa
© 2012 IBM Corporation29 Is This entity Relevant?
Entity type Scoring P(ET|Q)
� We measure the “similarity” between the query type and the entity type
( , )
( , ' )
'
( | ) ( ) logT T
T T
d Q E
T T d Q E
E R
eP E Q f c
e
α
α
−
−
∈
=
∑�
� d(QT,ET) - the type distance, is domain dependent
� In our experiments we measured the distance in the
Wikipedia category graph
� The minimal path length between all pairs of the query and the entity’s page categories
IBM Research - Haifa
© 2012 IBM Corporation30 Is This entity Relevant?
Entity Name Scoring P(EN|Q)
� We measure the dependency between the query term(s) and the entity name
�Globally
�Measure the proximity between the query term(s) and the entity name in the whole collection
• We use pointwise mutual information (PMI) – the likelihood of finding one term in proximity to another term
�Locally
�Measure the proximity between the query terms and the entity name in the top retrieved documents
{ , , , , , , }
P( | ) ( )N
EN
T O U
X X
N E N
X A c X
A S T O U PMI PMI PMI
E Q f cλ∈ ∈
=
=∑ ∑
IBM Research - Haifa
© 2012 IBM Corporation31 Is This entity Relevant?
Experimental Results over INEX Entity track (2007-2009)
Full Independence
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
S(ED) S(ED,ET) S(ED,ET,EN) INEX top
2007
2008
2009
Sequential dependence
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
S(ED) S(ED,ET) S(ED,ET,EN) INEX top
2007
2008
2009
Full dependence
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
S(ED) S(ED,ET) S(ED,ET,EN) INEX top
2007
2008
2009
� Results are improved significantly
when type and name scoring were
added
� Final Results are superior to top INEX
results at 2007,2008, and comparable
to 2009
� Dependence models have not improved over Independence model??
IBM Research - Haifa
© 2012 IBM Corporation32 Is This entity Relevant?
Exploratory EoS
� When only an entity is given as input, the information need is quite fuzzy
� Any related entity has a potential to be relevant
� Therefore any related entity should be retrieved!
� High diversity in search results (entity types, relationship types)
� How can we ease the user to find the most relevant answers?
� Iterative IR – let the user navigate and explore the ER graph
� Facet search:
� Categorize the search results according to their facets (entity types/attributes..)
� Let the user drill down: restrict retrieved entities to a specific facet
� NOTE: We still need to rank the search results in each of the facets!
� Graph navigation:
� Let the user explore the graph by using a retrieved entity as a pivot to a new search
� Query reformulation
IBM Research - Haifa
© 2012 IBM Corporation33 Is This entity Relevant?
Search over Social Media Data (SaND) – (Carmel 2009, Guy 2010)
� SaND provides social aggregation over social data
� SaND builds an entity-entity relationship matrix that maps a given entity to all related entities, weighted by their relationship strength
�Direct relations of a user to:
�document – as an author, tagger and commenter
�another user – as a friend or as a manager/employee
�tag – she used, or tagged by others
�group –as a member/owner
�Indirect relations:
�Two entities are indirectly related if both are directly related to the same entity
� The overall relationship strength between two entities is determined by a linear combination of their direct and indirect relationship strengths
IBM Research - Haifa
© 2012 IBM Corporation34 Is This entity Relevant?
Results contain different types of
entities – Blogs, Communities,
bookmarked documents etc..
Popular, higher ranked results
appear higher in the result set.
Search for the term ‘social’
Related People – Ranked list of
people that are related to the topic
and to the result set, in one or more
relationship types (author,
commenter, tagger, etc.)
Related Tags – Ranked tag cloud for
this result set.
IBM Research - Haifa
© 2012 IBM Corporation35 Is This entity Relevant?
Hovering over a result, highlights
the related people and tags
Narrowing the search to Luis
Suarez’ related results
IBM Research - Haifa
© 2012 IBM Corporation36 Is This entity Relevant?
Viewing results for query ‘social’
and person ‘Luis Suarez’
Viewing Luis’ business card, and
results related to him
IBM Research - Haifa
© 2012 IBM Corporation37 Is This entity Relevant?
Summary
� In this talk we raised several questions related to the concept of relevance in EoS:
� What makes an entity relevant to the user’s need?
� What is the meaning of relevance in this context?
� Is it the same notion of relevance used in document retrieval?
� We argue that the relevance of an entity can be estimated, according to supportive evidences provided by the search system
� We talked on EoS common retrieval techniques:
� Profile based approach
� The Voting approach
� Relevance propagation
� We discussed several examples of EoS systems and how relevance estimation can be applied in these domains
� We claimed that the scale and diversity of EoS search results demand Exploratory search techniques such as Facet search and Graph navigation
IBM Research - Haifa
© 2012 IBM Corporation38 Is This entity Relevant?
Open Questions and Challenges
� Entity Similarity
� While in document retrieval similarity plays a central role in relevance judgment, entity similarity measurement should still be better understood
� Attribute based similarity, Evidence based similarity
� Graph proximity
� Hybrid approaches
� The clustering hypothesis:
� Are two “similar” entities likely being relevant to the same information need?
� Challenges
� to what extent relevant entities are indeed similar to each other
� and according to which similarity measurement
� Relevance propagation: What relationship types provide effective relevance propagation channels?
� Do your friends inherit your own expertise?
� Which relationship types contribute to relevance propagation?
IBM Research - Haifa
© 2012 IBM Corporation39 Is This entity Relevant?
Thank You!
Questions?
IBM Research - Haifa © 2012 IBM Corporation
Is This Entity Relevant to Your Needs?
David Carmel
IBM Research - Haifa, Israel