ch tchapter 10 - brigham young universitycs453ta/notes/ch10.pdf · pros ¾can find answers to...
TRANSCRIPT
Ch t 10Chapter 10
Social SearchSocial Search
Social SearchSocial search
Communities of users actively participating in the searchCommunities of users actively participating in the search process
Goes beyond classical search tasks
Key differences
Users interact with the system
Users interact with other users implicitly/explicitly
2
Web 2.0SSocial search includes, but is not limited to, the so-
called social media site Collectively referred to as “Web 2 0” as opposed toCollectively referred to as Web 2.0 as opposed to
the classical notion of the Web (“Web 1.0”)
Social media sitesSocial media sitesUser generated contentU t th i d th ’ t tUsers can tag their own and other’s contentUsers can share favorites, tags, etc., with others
Examples.Digg, Twitter, Flickr, YouTube, Del.icio.us,
CiteULike, MySpace, Facebook, and LinkedIn3
Social Search Topics User tags
Searching within communitiesSearching within communities
Adaptive filtering
Recommender systems
Peer-to-peer and metasearchp
4
User Tags & Manual Indexing Then: Library card catalogs
Indexing terms chosen with search in mind
Experts generate indexing terms
Terms are very high quality
Terms chosen from controlled vocabulary
Now: Social media tagging
Tags not always chosen with search in mind
Users generate tagsg g
Tags can be noisy or even incorrect
Tags chosen from folksonomiesTags chosen from folksonomies
5
Types of User Tags Content-based
car, woman, sky
Context-basednew york city, empire state building
Attributenikon (type of camera), black and white (type of movie),
homepage (type of web page)
Subjectivepretty, amazing, awesome
Organizational to do, my pictures, readme
6
Searching Tags Searching user tags is challenging
Most items have only a few tags
Tags are very short
Boolean probabilistic vector space & languageBoolean, probabilistic, vector space & language modeling will fail if use naïvely
M st o ercome the ocab lar mismatch problemMust overcome the vocabulary mismatch problembetween the query and tags
7
Tag Expansion Can overcome vocabulary mismatch problem by
expanding tag representation w/ external knowledge
Possible external sourcesThesaurus
Web search results
Query logs
After tags have been expanded, can use standard retrieval models
8
Tag Expansion Using Search ResultsAge of Aquariums - Tropical Fish Huge educational aquarium site for tropical fish hobbyists, promoting responsible fish keeping internationally since 1997.
The Krib (Aquaria and Tropical Fish) This site contains information about tropical fish aquariums, including archived usenet postings and e-mail discussions, along with new ...
…
Keeping Tropical Fish and Goldfish in Aquariums Fish Bowls andKeeping Tropical Fish and Goldfish in Aquariums, Fish Bowls, and ... Keeping Tropical Fish and Goldfish in Aquariums, Fish Bowls, and Ponds at AquariumFish.net.
al ums
h
fish
tropi
caaq
uariu
gold
fish
bow
ls
9P(w | “tropical fish”)
Searching TagsEven with tag expansion, searching tags is challenging
Tags are inherently noisy and incorrectTags are inherently noisy and incorrect
Many items may not even be tagged!
Typically easier to find popular items with many tags than less popular items with few/no tags
10
Inferring Missing TagsHow can we automatically tag items with few or no tags?
Uses of inferred tagsgImproved tag search
Automatic tag suggestiong gg
11
Methods for Inferring TagsTF.IDF
Suggest tags that have a high TF.IDF weight in the item
Only works for textual items
ClassificationTrain binary classifier for each tag
Performs well for popular tags, but not as well for p p grare tags
Maximal marginal relevanceMaximal marginal relevance
Finds tags that are relevant to the item and novel with respect to existing tags
12
Browsing and Tag CloudsSearch is useful for finding items of interest
Browsing is more useful for exploring collections of g p gtagged items
Various ways to visualize collections of tagsa ous ays o sua e co ec o s o agsTag lists
Tag cloudsag c ouds
Alphabetical order
Grouped by categoryp y g y
Formatted/sorted according to popularity
13
Example Tag Cloudanimals architecture art australia autumn baby band barcelona beach berlin
birthday black blackandwhite blue california cameraphone canada canonycar cat chicago china christmas church city clouds color concert day dog
england europe family festival film florida flower flowers foodfrance friends fun garden germany girl graffiti green halloween hawaii
holiday home house india ireland italy japan july kids lake landscape light live
london macro me me ico music nature new newyork nightlondon macro me mexico music nature new newyork nightnikon nyc ocean paris park party people portrait red river rock
sanfrancisco scotland sea seattle show sky snow spain spring streetsanfrancisco scotland sea seattle show sky snow spain spring street
summer sunset taiwan texas thailand tokyo toronto traveltree trees trip uk usa vacation washington water wedding
14
p g
Searching with CommunitiesWhat is an online community?
Groups of entities that interact in an online environment and share common goals traits or interestsand share common goals, traits, or interests
ExamplesBaseball fan community
Digital photography community
Not all communities are made up of humans!
Web communities are collections of web pages that p gare all about a common topic
15
Finding CommunitiesWhat are the characteristics of a community?
Entities within a community are similar to each other
Members of a community are likely to interact more with other members of the community than those outside of the communityof the community
Can represent interactions between a set of entities as a grapha graph
Vertices are entities
Edges (directed or undirected) indicate interactionsEdges (directed or undirected) indicate interactions between the entities
16
Graph Representation11
4
22
5
4433
55
7766
Node: 1 2 3 4 5 6 7000
000
100
000
100
000
000
Node: 3 5 6 7
000
000
001
000
000
000
000
Vector:
17
00
10
10
10
00
00
00
HITSHyperlink-induced Topic Search (HITS) algorithm can be
used to find communitiesLink analysis algorithm like PageRankLink analysis algorithm, like PageRank
Each entity has a hub and authority score
Based on a circular set of assumptionsGood hubs point to good authorities
Good authorities are pointed to by good hubs
Iterative algorithm:g
18
19
HITS Example1, 1
1 1
1, 1
1, 1
2, 0
0 3
0, 1
0, 1
.33, 0
0 50
0, .17
0, .17
Iteration 1: Input Iteration 1: Update Scores Iteration 1: Normalize Scores
1, 1
1, 1
1, 11, 1
1, 1
0, 3
3, 00, 0
.17, .17
0, .50
.50, 00,0
.67, 0
0, 1
0, .50
0, .50
.33, 0
0, .43
0, .21
0, .21
.33, 0
0, .50
0, .17
0, .17
Iteration 2: Input Iteration 2: Update Scores Iteration 2: Normalize Scores
.50, .33.83, 0
0,0.25, .14
.42, 00,0
.17, .17.50, 0
0,0
Iteration 3: Input Iteration 3: Update Scores Iteration 3: Normalize Scores
.57, 0
0,1
0, .42
0, .42
.31, 0
0, .46
0, .19
0, .19
.33, 0
0, .43
0, .21
0, .21
20
.43, .33.86, 0
0,0.23, .16
.46, 00,0
.25, .14.42, 0
0,0
Finding CommunitiesHITS
Can apply HITS to entity interaction graph to find communitiescommunities
Entities with large authority scores are the “core” or “authoritative” members of the communityy
ClusteringApply agglomerative or K means clustering to entity graphApply agglomerative or K-means clustering to entity graph
How to choose K?
Evaluating community finding algorithms is hard
Can use communities in various ways to improve y psearch, browsing, expert finding, recommendation, etc.
21
Community Based Question AnsweringSome complex information needs can’t be answered by
traditional search enginesInformation from multiple sources
Human expertise
Community based question answering tries to overcome these limitations
Searcher enters question
Community members answer question
22
Example Questions
23
Community Based Question AnsweringPros
Can find answers to complex/obscure questions
Answers are from humans, not algorithms
Can search archive of previous questions/answers
ConsOften takes time to get a responseOften takes time to get a response
Some questions never get answered
Answers may be wrongAnswers may be wrong
24
Question Answering ModelsHow can we effectively search an archive of question/
answer pairs?Can be treated as a translation problemCan be treated as a translation problem
Translate a question into a related question
Translate a question into an answerTranslate a question into an answer
Translation-based language model:
Enhanced translation model:
25
Computing Translation ProbabilitiesTranslation probabilities are learned from a parallel corpus
Most often used for learning inter language probabilitiesMost often used for learning inter-language probabilities
Can be used for intra-language probabilitiesTreat question / answer pairs are parallel corpus
Various tools exist for computing translation probabilitiesVarious tools exist for computing translation probabilities from a parallel corpus
26
Example Question/Answer Translations
27
Collaborative Search Scenarios
28
Co-located Collaborative Searching Remote Collaborative Searching
Collaborative SearchChallenges
How do users interact with system?y
How do users interact with each other?
How is data shared?
What data persists across sessions?
Very few commercial collaborative search systemsVery few commercial collaborative search systems
Likely to see more of this type of system in the future
29
Document FilteringAd hoc retrieval
Document collections and information needs changegwith time
Results returned when query is entered
Document filteringDocument collections change with time, but information g ,
needs are static (long-term)
Long term information needs represented as a profile
Documents entering system that match the profile are delivered to the user via a push mechanism
30
ProfilesR t l t i f ti dRepresents long term information needs
Can be represented in different waysBoolean or keyword query
Sets of relevant and non-relevant documents
Relational constraints
• “published before 1990
• “price in the $10-$25 range”
Actual representation usually depends on underlying filtering model
Can be static (static filtering) or updated over time ( g) p(adaptive filtering)
31
Document Filtering Scenarios
Profile1 Profile
1Profile
1.1
Profile2
Profile
Profile2
Profile
Profile 2.1
Profile3 Profile
3Profile
3.1
Document Streamt = 2 t = 3 t = 5 t = 8
Document Streamt = 2 t = 3 t = 5 t = 8
Static Filtering Adaptive Filtering
32
Static FilteringGi fi d fil h d t i ifGiven a fixed profile, how can we determine if an
incoming document should be delivered?
Treat as an IR problemBoolean
Vector space
Language modeling
Treat as supervised learning problemNaïve Bayes
Support vector machines
33
Adaptive FilteringIn adaptive filtering, profiles are dynamic
How can profiles change?How can profiles change?User can explicitly update the profile
User can provide (relevance) feedback about theUser can provide (relevance) feedback about the documents delivered to the profile
Implicit user behavior can be captured and used to update the profile
34
Fast Filtering with Millions of ProfilesReal filtering systems
May have thousands or even millions of profiles
Many new documents will enter the system daily
How to efficiently filter in such a system?How to efficiently filter in such a system?Most profiles are represented as text or a set of features
Build an inverted index for the profilesBuild an inverted index for the profiles
Distill incoming documents as “queries” and run against index
35
Evaluation of Filtering SystemsDefinition of “good” depends on the purpose of the
underlying filtering system
Generic filtering evaluation measure:
α = 2, β = 0, δ = -1, and γ = 0 is widely used
36
Collaborative FilteringIn static and adaptive filtering, users and their profiles
are assumed to be independent of each other
Similar users are likely to have similar preferences
Collaborative filtering exploits relationships betweenCollaborative filtering exploits relationships between users to improve how items (documents) are matched to users (profiles)
37
Recommender SystemsRecommender systems recommend items that a user
may be interested in
Examples
Amazon.com
NetFlix
R d t ll b ti filt i tRecommender systems use collaborative filtering to recommend items to users
38
Recommender System AlgorithmsInput
<user, item, rating> tuples for items that the user has li itl t dexplicitly rated
Typically represented as a user-item matrix
Output<user, item, rating> tuples for items that the user
has not rated
Can be thought of as filling in the missing entries of the user-item matrixuser item matrix
Most algorithms infer missing ratings based on the ratings of similar usersratings of similar users
39
Recommender Systems
1111
??11 11
44
22
33
11 55??
55
??
55?? 5555??
40
Collaborative SearchingT diti l h i l hTraditional search assumes single searcher
Collaborative search involves a group of users, with a common goal - searching together in a collaborative setting
Example scenariosStudents doing research for a history report
Family members searching for information on how to care for an aging relative
T b ki t th i f ti dTeam member working to gather information and requirements for an industrial project
41
Collaborative SearchTwo types of collaborative search settings depending
on where participants are physically located
Co-locatedParticipants in same location
Co-Search system
Remove collaborativeRemove collaborativeParticipants in different locations
Search Together systemSearch-Together system
42
Static Filtering with Language ModelsAssume profile consists of K relevant documents (Ti),
each with weight αi
Probability of a word given the profile is:
KL divergence between profile and document model isKL divergence between profile and document model is used as score:
If –KL(P||D) ≥ θ, then deliver D to P( || ) ,Threshold (θ) can be optimized for some metric
43
Adaptive Filtering ModelsRocchio
Profiles treated as vectors
Relevance-based language modelsProfiles treated as language models
44
Summary of Filtering Models
45