social search and discovery using a unified approach einat amitay et al. ibm research lab in haifa,...
TRANSCRIPT
Social Search and Discovery Using a Unified Approach
Einat Amitay et al.IBM Research Lab in Haifa, IsraelHT’09
18 March 2011Presentation @ IDB Lab Seminar
IDB Tagging Team, School of CSE, SNUPresented by Kangpyo Lee
2
A Variety of Web Search Types
Social SearchPersonalized
Search
Unified SearchUniversal Search
Multi-entity Search
Faceted SearchMulti-faceted
Search
Exploratory Search
Vertical Search
3
Outline Introduction Related Work Implementation Social Search within the Enterprise User Study Summary
4
Introduction Recent Web 2.0 applications (e.g., web logs, collaborative
bookmarking systems, and social networks) introduce new entities & relations in addition to regular web pages
Web 2.0 entities relate to each other in several ways– Documents may relate to other documents by referencing each other– A user may relate to a document through authorship relation, as a
tagger, as an author, or as mentioned in the page’s content – A user may relate to other users through social relations – A tag relates to the bookmark it is associated with, and also to the
tagger
These entities & relations may prove valuable in enhancing the search experience – By serving as potential search results– By influencing ranking algorithms
5
Introduction We present and evaluate novel methods for leverag-
ing social information to enhance search results and discover relations between Web 2.0 applications
Our approach leverages a unified representation of the entities and their relations
We then use this intricate heterogeneous collection to establish an all-encompassing social search solution
6
Introduction Social search solution
– Allows users to query for specific entities and retrieve results of all relevant types
– The system returns, in addition to standard search results, users related to the query, as well as tags that are associ-ated with relevant documents
– These tags can be further used to categorize the search re-sults and to better refine the searcher’s information need
We use the term social search engine to describe this multi-entity search system based on “social” data
Our social search system is the only one that pro-vides a unified approach for searching and retrieving entities of all types
7
Introduction
- Unified Approach
Our social data include records of users’ public activ-ity with documents – such as bookmarking, tagging, rating, or comments made to
other public Web 2.0 entities
Our system allows the search for any object type (e.g., documents, person, or tag) and the retrieval of all entity types
The system supports – Standard textual queries – Entity queries – Any combination of the two
8
Introduction
- Unified Approach
The social search engine is based on the unified search approach
Unified search – A.k.a. heterogeneous interrelated entity search – An emerging paradigm within IR – The search space is expanded to represent heterogeneous
information about objects that may relate to each other in several ways
Direct relations Indirect relations
The system must be scalable, responsive, and reflect the rapid update patterns typical in Web 2.0 systems
9
Introduction
- Unified Approach
We present a novel realization of unified search para-digm based on multifaceted search – Represents each of the system’s entities by a retrievable
document– Direct relations between entities are represented by marking
one of the elements as a “facet” of its counterpart
– The strength of the relationship between the two objects is represented by the strength of document-facet relationship
A BDirect Relation
- A is one of B’s facets- B is one of A’s facets
10
Introduction
- Unified Approach
An efficient mechanism for updating relations be-tween objects as well as efficient search over the heterogeneous data – Only direct relations between objects need to be updated
when new entities are added – Indirect relations are dynamically induced from the direct re-
lations and computed on-the-fly during query execution time
– Directly-related objects are retrieved and scored during run-time using the search engine’s regular scoring mechanisms
– Indirectly-related entities are retrieved and scored using an implementation of faceted search
11
Outline Introduction Related Work Implementation Social Search within the Enterprise User Study Summary
12
Related Work Social search
– The set of annotations provided by the public can be used to enrich the page content
– The # of annotations of a web page can be used as additional evidence of document quality for improved ranking of search results
– Social data enables users to search for other people with whom thy maintain relationships in the network
Social ranking – Ranking all entities retrieved by the social search engine – FolkRank and SocialPageRank – Applying PageRank-like computation depends heavily on the
graph size and is expected to be very slow – Different entity types provide different retrieval values for the
searcher, hence they should be ranked according to their own characteristics
13
Related Work
- Multi-Entity Search
Multi-entity search – Extending basic search functionality by answering user
queries with many types of entities – Usually based on analysis of the relationship between enti-
ties and documents relevant to the query
Searching over a multi-entity graph – Nodes are entities (terms, documents, persons, annotations) – Edges are the relations between the entities
SimFusion uses a Unified Relationship Matrix (URM) to represent the multi-entity graph
14
Related Work
- Multi-Entity Search
Unified Relationship Matrix (URM) – Relations between two object types are represented via a re-
lationship matrix Mij
– The (k, l) entry of matrix Mij represents the strength of the re-lation between the object pairs (ok, ol) of types Oi and Oj re-spectively
– The URM matrix U Encapsulates all matrices to provide a unified representation of
the unified search space Provides relationship strength between any two directly related
entities, along with a theoretically elegant way to calculate indi-rect relations through matrix multiplication
15
Outline Introduction Related Work Implementation Social Search within the Enterprise User Study Summary
16
Implementation Our solution to unified search represents each object
in the system in two ways – (1) as a retrievable document – (2) as a facet (category) of all the objects to which it relates
A unified representation of a collaborative bookmark-ing system – Three object types – web pages, users, and tags– Each object type is associated with a corresponding docu-
ment – a web page document, a user document, and a tag document
– Three relationship types A user-type facet between a user & the tagged web page A tag-type facet between a tag & the associated web page A user-type facet between a user & a tag used for bookmarking
17
Implementation
- Scoring Indirectly Related Objects
The strength of the indirect relation between object o1 & o2
– U(o, o’) – the corresponding entry in the URM matrix – Equivalent to squaring the URM matrix
Provides the relationship strength of order two between any two objects
Eq. 1 can be generalized to score objects based on their indirect relations with any query – The score vector s0(q) provides the direct scores of all N ob-
jects in the system to the query
– The score vector s1(q) provides the indirect scores of all ob-jects
18
Implementation
- Scoring Indirectly Related Objects
In addition, objects can be scored according to their relative popularity, or authority – FolkRank or SocialPageRank can be used – Inverse entity frequency (ief) score
N – the # of all objects in the system No – the # of objects directly related to o Penalizes objects that are related to many objects in general
The final score of object o for a query q
19
Implementation
- Multifaceted Search
Multifaceted search aims to combine the two main search approaches:– Direct search – Navigational search – offering navigational refinement on the
results by categorizing the search results into predefined facets along with the counts of results per facet
Multifaceted search has become the prevailing user interaction mechanism in e-commerce sites – Now being extended to deal with semi-structured data, con-
tinuous dimensions, and folksonomies
20
Implementation
- Multifaceted Search
The scores of directly related objects are equivalent to the scores as represented by s0(q)
The score of an indirectly related object, o, is com-puted by aggregating its relationship strength with all matching documents, multiplied by their direct score
– w(o, oi) – the relationship strength between the document oi & its facet o
– Equivalent to Eq. 2 since w(o, oi) = U(o, oi)
Indirectly related objects are represented by accumu-lating all facets of the same type
21
Implementation
- Efficiency Factors
Two issues regarding use of the URM matrix for social search – 1) the need for efficient computation of indirect relations – 2) efficient dynamic updates
The universal query (q = ‘*’) that retrieves all the ob-jects, indexed by the system as well as all objects re-lated to them, has a query runtime of less than four seconds
Dynamic updates are handled by a mechanism that is implemented by storing the changes in an external databases
22
Outline Introduction Related Work Implementation Social Search within the Enterprise User Study Summary
23
Social Search within the Enterprise
Textual Query Entity Query
24
Social Search within the Enterprise
- Social Data & Social Search Application
Web 2.0 services of IBM – Dogear – a collaborative bookmarking service (373,821
bookmarks, 234,856 web pages)– BlogCentral – a central blog service (77,930 blog threads) – BluePages – the enterprise directory and employee profile
application (15,779 IBMers)– About 700,000 unique entities – Cow Search – the social search application available to all
users of IBM’s intranet
25
Outline Introduction Related Work Implementation Social Search within the Enterprise User Study Summary
26
User Study Our goal was to measure both the quality of the re-
turned document set and the related users and tags – The evaluation methodologies for documents are well known
and have standard measures– There are no standard ways of measuring the quality of re-
lated users of tags
A user study was thus used– The retrieved documents were examined and marked with
three relevance levels (0-not relevant, 1-marginally relevant, 2-highly relevant)
– The quality of search results was measured by the normal-ized discount cumulative gain (NDCG) measure
– To evaluate the effectiveness of the related people, we emailed and asked the 612 random users to rate on a Likert scale of 1 to 5
27
User Study
- Results
Social data contribution to enterprise search – We measure the quality of search results using manual as-
sessments of the top-k search results for the 50 chosen queries
28
User Study
- Results
Related users
Related tags
29
Outline Introduction Related Work Implementation Social Search within the Enterprise User Study Summary
30
Summary Social data is valuable
– 1. The high precision of top retrieved documents demon-strate that user feedback identifies high quality content in the corpus
– 2. User comments and tags are highly beneficial in general and augment the description of system entities, while provid-ing additional evidence for object popularity
Future research – Exploiting personal social networks for search personaliza-
tion – Documents or tags recommendations – Quantifying the contribution of social objects to the effec-
tiveness of the search system
Thank You!Any Questions or Comments?