ch tchapter 10 - brigham young universitycs453ta/notes/ch10.pdf · pros ¾can find answers to...

45
Ch t 10 Chapter 10 Social Search Social Search

Upload: others

Post on 04-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Ch t 10Chapter 10

Social SearchSocial Search

Page 2: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Social SearchSocial search

Communities of users actively participating in the searchCommunities of users actively participating in the search process

Goes beyond classical search tasks

Key differences

Users interact with the system

Users interact with other users implicitly/explicitly

2

Page 3: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Web 2.0SSocial search includes, but is not limited to, the so-

called social media site Collectively referred to as “Web 2 0” as opposed toCollectively referred to as Web 2.0 as opposed to

the classical notion of the Web (“Web 1.0”)

Social media sitesSocial media sitesUser generated contentU t th i d th ’ t tUsers can tag their own and other’s contentUsers can share favorites, tags, etc., with others

Examples.Digg, Twitter, Flickr, YouTube, Del.icio.us,

CiteULike, MySpace, Facebook, and LinkedIn3

Page 4: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Social Search Topics User tags

Searching within communitiesSearching within communities

Adaptive filtering

Recommender systems

Peer-to-peer and metasearchp

4

Page 5: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

User Tags & Manual Indexing Then: Library card catalogs

Indexing terms chosen with search in mind

Experts generate indexing terms

Terms are very high quality

Terms chosen from controlled vocabulary

Now: Social media tagging

Tags not always chosen with search in mind

Users generate tagsg g

Tags can be noisy or even incorrect

Tags chosen from folksonomiesTags chosen from folksonomies

5

Page 6: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Types of User Tags Content-based

car, woman, sky

Context-basednew york city, empire state building

Attributenikon (type of camera), black and white (type of movie),

homepage (type of web page)

Subjectivepretty, amazing, awesome

Organizational to do, my pictures, readme

6

Page 7: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Searching Tags Searching user tags is challenging

Most items have only a few tags

Tags are very short

Boolean probabilistic vector space & languageBoolean, probabilistic, vector space & language modeling will fail if use naïvely

M st o ercome the ocab lar mismatch problemMust overcome the vocabulary mismatch problembetween the query and tags

7

Page 8: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Tag Expansion Can overcome vocabulary mismatch problem by

expanding tag representation w/ external knowledge

Possible external sourcesThesaurus

Web search results

Query logs

After tags have been expanded, can use standard retrieval models

8

Page 9: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Tag Expansion Using Search ResultsAge of Aquariums - Tropical Fish Huge educational aquarium site for tropical fish hobbyists, promoting responsible fish keeping internationally since 1997.

The Krib (Aquaria and Tropical Fish) This site contains information about tropical fish aquariums, including archived usenet postings and e-mail discussions, along with new ...

Keeping Tropical Fish and Goldfish in Aquariums Fish Bowls andKeeping Tropical Fish and Goldfish in Aquariums, Fish Bowls, and ... Keeping Tropical Fish and Goldfish in Aquariums, Fish Bowls, and Ponds at AquariumFish.net.

al ums

h

fish

tropi

caaq

uariu

gold

fish

bow

ls

9P(w | “tropical fish”)

Page 10: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Searching TagsEven with tag expansion, searching tags is challenging

Tags are inherently noisy and incorrectTags are inherently noisy and incorrect

Many items may not even be tagged!

Typically easier to find popular items with many tags than less popular items with few/no tags

10

Page 11: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Inferring Missing TagsHow can we automatically tag items with few or no tags?

Uses of inferred tagsgImproved tag search

Automatic tag suggestiong gg

11

Page 12: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Methods for Inferring TagsTF.IDF

Suggest tags that have a high TF.IDF weight in the item

Only works for textual items

ClassificationTrain binary classifier for each tag

Performs well for popular tags, but not as well for p p grare tags

Maximal marginal relevanceMaximal marginal relevance

Finds tags that are relevant to the item and novel with respect to existing tags

12

Page 13: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Browsing and Tag CloudsSearch is useful for finding items of interest

Browsing is more useful for exploring collections of g p gtagged items

Various ways to visualize collections of tagsa ous ays o sua e co ec o s o agsTag lists

Tag cloudsag c ouds

Alphabetical order

Grouped by categoryp y g y

Formatted/sorted according to popularity

13

Page 14: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Example Tag Cloudanimals architecture art australia autumn baby band barcelona beach berlin

birthday black blackandwhite blue california cameraphone canada canonycar cat chicago china christmas church city clouds color concert day dog

england europe family festival film florida flower flowers foodfrance friends fun garden germany girl graffiti green halloween hawaii

holiday home house india ireland italy japan july kids lake landscape light live

london macro me me ico music nature new newyork nightlondon macro me mexico music nature new newyork nightnikon nyc ocean paris park party people portrait red river rock

sanfrancisco scotland sea seattle show sky snow spain spring streetsanfrancisco scotland sea seattle show sky snow spain spring street

summer sunset taiwan texas thailand tokyo toronto traveltree trees trip uk usa vacation washington water wedding

14

p g

Page 15: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Searching with CommunitiesWhat is an online community?

Groups of entities that interact in an online environment and share common goals traits or interestsand share common goals, traits, or interests

ExamplesBaseball fan community

Digital photography community

Not all communities are made up of humans!

Web communities are collections of web pages that p gare all about a common topic

15

Page 16: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Finding CommunitiesWhat are the characteristics of a community?

Entities within a community are similar to each other

Members of a community are likely to interact more with other members of the community than those outside of the communityof the community

Can represent interactions between a set of entities as a grapha graph

Vertices are entities

Edges (directed or undirected) indicate interactionsEdges (directed or undirected) indicate interactions between the entities

16

Page 17: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Graph Representation11

4

22

5

4433

55

7766

Node: 1 2 3 4 5 6 7000

000

100

000

100

000

000

Node: 3 5 6 7

000

000

001

000

000

000

000

Vector:

17

00

10

10

10

00

00

00

Page 18: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

HITSHyperlink-induced Topic Search (HITS) algorithm can be

used to find communitiesLink analysis algorithm like PageRankLink analysis algorithm, like PageRank

Each entity has a hub and authority score

Based on a circular set of assumptionsGood hubs point to good authorities

Good authorities are pointed to by good hubs

Iterative algorithm:g

18

Page 19: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

19

Page 20: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

HITS Example1, 1

1 1

1, 1

1, 1

2, 0

0 3

0, 1

0, 1

.33, 0

0 50

0, .17

0, .17

Iteration 1: Input Iteration 1: Update Scores Iteration 1: Normalize Scores

1, 1

1, 1

1, 11, 1

1, 1

0, 3

3, 00, 0

.17, .17

0, .50

.50, 00,0

.67, 0

0, 1

0, .50

0, .50

.33, 0

0, .43

0, .21

0, .21

.33, 0

0, .50

0, .17

0, .17

Iteration 2: Input Iteration 2: Update Scores Iteration 2: Normalize Scores

.50, .33.83, 0

0,0.25, .14

.42, 00,0

.17, .17.50, 0

0,0

Iteration 3: Input Iteration 3: Update Scores Iteration 3: Normalize Scores

.57, 0

0,1

0, .42

0, .42

.31, 0

0, .46

0, .19

0, .19

.33, 0

0, .43

0, .21

0, .21

20

.43, .33.86, 0

0,0.23, .16

.46, 00,0

.25, .14.42, 0

0,0

Page 21: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Finding CommunitiesHITS

Can apply HITS to entity interaction graph to find communitiescommunities

Entities with large authority scores are the “core” or “authoritative” members of the communityy

ClusteringApply agglomerative or K means clustering to entity graphApply agglomerative or K-means clustering to entity graph

How to choose K?

Evaluating community finding algorithms is hard

Can use communities in various ways to improve y psearch, browsing, expert finding, recommendation, etc.

21

Page 22: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Community Based Question AnsweringSome complex information needs can’t be answered by

traditional search enginesInformation from multiple sources

Human expertise

Community based question answering tries to overcome these limitations

Searcher enters question

Community members answer question

22

Page 23: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Example Questions

23

Page 24: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Community Based Question AnsweringPros

Can find answers to complex/obscure questions

Answers are from humans, not algorithms

Can search archive of previous questions/answers

ConsOften takes time to get a responseOften takes time to get a response

Some questions never get answered

Answers may be wrongAnswers may be wrong

24

Page 25: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Question Answering ModelsHow can we effectively search an archive of question/

answer pairs?Can be treated as a translation problemCan be treated as a translation problem

Translate a question into a related question

Translate a question into an answerTranslate a question into an answer

Translation-based language model:

Enhanced translation model:

25

Page 26: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Computing Translation ProbabilitiesTranslation probabilities are learned from a parallel corpus

Most often used for learning inter language probabilitiesMost often used for learning inter-language probabilities

Can be used for intra-language probabilitiesTreat question / answer pairs are parallel corpus

Various tools exist for computing translation probabilitiesVarious tools exist for computing translation probabilities from a parallel corpus

26

Page 27: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Example Question/Answer Translations

27

Page 28: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Collaborative Search Scenarios

28

Co-located Collaborative Searching Remote Collaborative Searching

Page 29: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Collaborative SearchChallenges

How do users interact with system?y

How do users interact with each other?

How is data shared?

What data persists across sessions?

Very few commercial collaborative search systemsVery few commercial collaborative search systems

Likely to see more of this type of system in the future

29

Page 30: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Document FilteringAd hoc retrieval

Document collections and information needs changegwith time

Results returned when query is entered

Document filteringDocument collections change with time, but information g ,

needs are static (long-term)

Long term information needs represented as a profile

Documents entering system that match the profile are delivered to the user via a push mechanism

30

Page 31: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

ProfilesR t l t i f ti dRepresents long term information needs

Can be represented in different waysBoolean or keyword query

Sets of relevant and non-relevant documents

Relational constraints

• “published before 1990

• “price in the $10-$25 range”

Actual representation usually depends on underlying filtering model

Can be static (static filtering) or updated over time ( g) p(adaptive filtering)

31

Page 32: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Document Filtering Scenarios

Profile1 Profile

1Profile

1.1

Profile2

Profile

Profile2

Profile

Profile 2.1

Profile3 Profile

3Profile

3.1

Document Streamt = 2 t = 3 t = 5 t = 8

Document Streamt = 2 t = 3 t = 5 t = 8

Static Filtering Adaptive Filtering

32

Page 33: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Static FilteringGi fi d fil h d t i ifGiven a fixed profile, how can we determine if an

incoming document should be delivered?

Treat as an IR problemBoolean

Vector space

Language modeling

Treat as supervised learning problemNaïve Bayes

Support vector machines

33

Page 34: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Adaptive FilteringIn adaptive filtering, profiles are dynamic

How can profiles change?How can profiles change?User can explicitly update the profile

User can provide (relevance) feedback about theUser can provide (relevance) feedback about the documents delivered to the profile

Implicit user behavior can be captured and used to update the profile

34

Page 35: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Fast Filtering with Millions of ProfilesReal filtering systems

May have thousands or even millions of profiles

Many new documents will enter the system daily

How to efficiently filter in such a system?How to efficiently filter in such a system?Most profiles are represented as text or a set of features

Build an inverted index for the profilesBuild an inverted index for the profiles

Distill incoming documents as “queries” and run against index

35

Page 36: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Evaluation of Filtering SystemsDefinition of “good” depends on the purpose of the

underlying filtering system

Generic filtering evaluation measure:

α = 2, β = 0, δ = -1, and γ = 0 is widely used

36

Page 37: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Collaborative FilteringIn static and adaptive filtering, users and their profiles

are assumed to be independent of each other

Similar users are likely to have similar preferences

Collaborative filtering exploits relationships betweenCollaborative filtering exploits relationships between users to improve how items (documents) are matched to users (profiles)

37

Page 38: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Recommender SystemsRecommender systems recommend items that a user

may be interested in

Examples

Amazon.com

NetFlix

R d t ll b ti filt i tRecommender systems use collaborative filtering to recommend items to users

38

Page 39: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Recommender System AlgorithmsInput

<user, item, rating> tuples for items that the user has li itl t dexplicitly rated

Typically represented as a user-item matrix

Output<user, item, rating> tuples for items that the user

has not rated

Can be thought of as filling in the missing entries of the user-item matrixuser item matrix

Most algorithms infer missing ratings based on the ratings of similar usersratings of similar users

39

Page 40: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Recommender Systems

1111

??11 11

44

22

33

11 55??

55

??

55?? 5555??

40

Page 41: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Collaborative SearchingT diti l h i l hTraditional search assumes single searcher

Collaborative search involves a group of users, with a common goal - searching together in a collaborative setting

Example scenariosStudents doing research for a history report

Family members searching for information on how to care for an aging relative

T b ki t th i f ti dTeam member working to gather information and requirements for an industrial project

41

Page 42: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Collaborative SearchTwo types of collaborative search settings depending

on where participants are physically located

Co-locatedParticipants in same location

Co-Search system

Remove collaborativeRemove collaborativeParticipants in different locations

Search Together systemSearch-Together system

42

Page 43: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Static Filtering with Language ModelsAssume profile consists of K relevant documents (Ti),

each with weight αi

Probability of a word given the profile is:

KL divergence between profile and document model isKL divergence between profile and document model is used as score:

If –KL(P||D) ≥ θ, then deliver D to P( || ) ,Threshold (θ) can be optimized for some metric

43

Page 44: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Adaptive Filtering ModelsRocchio

Profiles treated as vectors

Relevance-based language modelsProfiles treated as language models

44

Page 45: Ch tChapter 10 - Brigham Young Universitycs453ta/notes/ch10.pdf · Pros ¾Can find answers to complex/obscure questions ¾Answers are from humans, not algorithms ¾Can search archive

Summary of Filtering Models

45