ch tchapter 10 - brigham young universitycs453ta/notes/ch10.pdf · pros ¾can find answers to...

Post on 04-Sep-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Ch t 10Chapter 10

Social SearchSocial Search

Social SearchSocial search

Communities of users actively participating in the searchCommunities of users actively participating in the search process

Goes beyond classical search tasks

Key differences

Users interact with the system

Users interact with other users implicitly/explicitly

2

Web 2.0SSocial search includes, but is not limited to, the so-

called social media site Collectively referred to as “Web 2 0” as opposed toCollectively referred to as Web 2.0 as opposed to

the classical notion of the Web (“Web 1.0”)

Social media sitesSocial media sitesUser generated contentU t th i d th ’ t tUsers can tag their own and other’s contentUsers can share favorites, tags, etc., with others

Examples.Digg, Twitter, Flickr, YouTube, Del.icio.us,

CiteULike, MySpace, Facebook, and LinkedIn3

Social Search Topics User tags

Searching within communitiesSearching within communities

Adaptive filtering

Recommender systems

Peer-to-peer and metasearchp

4

User Tags & Manual Indexing Then: Library card catalogs

Indexing terms chosen with search in mind

Experts generate indexing terms

Terms are very high quality

Terms chosen from controlled vocabulary

Now: Social media tagging

Tags not always chosen with search in mind

Users generate tagsg g

Tags can be noisy or even incorrect

Tags chosen from folksonomiesTags chosen from folksonomies

5

Types of User Tags Content-based

car, woman, sky

Context-basednew york city, empire state building

Attributenikon (type of camera), black and white (type of movie),

homepage (type of web page)

Subjectivepretty, amazing, awesome

Organizational to do, my pictures, readme

6

Searching Tags Searching user tags is challenging

Most items have only a few tags

Tags are very short

Boolean probabilistic vector space & languageBoolean, probabilistic, vector space & language modeling will fail if use naïvely

M st o ercome the ocab lar mismatch problemMust overcome the vocabulary mismatch problembetween the query and tags

7

Tag Expansion Can overcome vocabulary mismatch problem by

expanding tag representation w/ external knowledge

Possible external sourcesThesaurus

Web search results

Query logs

After tags have been expanded, can use standard retrieval models

8

Tag Expansion Using Search ResultsAge of Aquariums - Tropical Fish Huge educational aquarium site for tropical fish hobbyists, promoting responsible fish keeping internationally since 1997.

The Krib (Aquaria and Tropical Fish) This site contains information about tropical fish aquariums, including archived usenet postings and e-mail discussions, along with new ...

Keeping Tropical Fish and Goldfish in Aquariums Fish Bowls andKeeping Tropical Fish and Goldfish in Aquariums, Fish Bowls, and ... Keeping Tropical Fish and Goldfish in Aquariums, Fish Bowls, and Ponds at AquariumFish.net.

al ums

h

fish

tropi

caaq

uariu

gold

fish

bow

ls

9P(w | “tropical fish”)

Searching TagsEven with tag expansion, searching tags is challenging

Tags are inherently noisy and incorrectTags are inherently noisy and incorrect

Many items may not even be tagged!

Typically easier to find popular items with many tags than less popular items with few/no tags

10

Inferring Missing TagsHow can we automatically tag items with few or no tags?

Uses of inferred tagsgImproved tag search

Automatic tag suggestiong gg

11

Methods for Inferring TagsTF.IDF

Suggest tags that have a high TF.IDF weight in the item

Only works for textual items

ClassificationTrain binary classifier for each tag

Performs well for popular tags, but not as well for p p grare tags

Maximal marginal relevanceMaximal marginal relevance

Finds tags that are relevant to the item and novel with respect to existing tags

12

Browsing and Tag CloudsSearch is useful for finding items of interest

Browsing is more useful for exploring collections of g p gtagged items

Various ways to visualize collections of tagsa ous ays o sua e co ec o s o agsTag lists

Tag cloudsag c ouds

Alphabetical order

Grouped by categoryp y g y

Formatted/sorted according to popularity

13

Example Tag Cloudanimals architecture art australia autumn baby band barcelona beach berlin

birthday black blackandwhite blue california cameraphone canada canonycar cat chicago china christmas church city clouds color concert day dog

england europe family festival film florida flower flowers foodfrance friends fun garden germany girl graffiti green halloween hawaii

holiday home house india ireland italy japan july kids lake landscape light live

london macro me me ico music nature new newyork nightlondon macro me mexico music nature new newyork nightnikon nyc ocean paris park party people portrait red river rock

sanfrancisco scotland sea seattle show sky snow spain spring streetsanfrancisco scotland sea seattle show sky snow spain spring street

summer sunset taiwan texas thailand tokyo toronto traveltree trees trip uk usa vacation washington water wedding

14

p g

Searching with CommunitiesWhat is an online community?

Groups of entities that interact in an online environment and share common goals traits or interestsand share common goals, traits, or interests

ExamplesBaseball fan community

Digital photography community

Not all communities are made up of humans!

Web communities are collections of web pages that p gare all about a common topic

15

Finding CommunitiesWhat are the characteristics of a community?

Entities within a community are similar to each other

Members of a community are likely to interact more with other members of the community than those outside of the communityof the community

Can represent interactions between a set of entities as a grapha graph

Vertices are entities

Edges (directed or undirected) indicate interactionsEdges (directed or undirected) indicate interactions between the entities

16

Graph Representation11

4

22

5

4433

55

7766

Node: 1 2 3 4 5 6 7000

000

100

000

100

000

000

Node: 3 5 6 7

000

000

001

000

000

000

000

Vector:

17

00

10

10

10

00

00

00

HITSHyperlink-induced Topic Search (HITS) algorithm can be

used to find communitiesLink analysis algorithm like PageRankLink analysis algorithm, like PageRank

Each entity has a hub and authority score

Based on a circular set of assumptionsGood hubs point to good authorities

Good authorities are pointed to by good hubs

Iterative algorithm:g

18

19

HITS Example1, 1

1 1

1, 1

1, 1

2, 0

0 3

0, 1

0, 1

.33, 0

0 50

0, .17

0, .17

Iteration 1: Input Iteration 1: Update Scores Iteration 1: Normalize Scores

1, 1

1, 1

1, 11, 1

1, 1

0, 3

3, 00, 0

.17, .17

0, .50

.50, 00,0

.67, 0

0, 1

0, .50

0, .50

.33, 0

0, .43

0, .21

0, .21

.33, 0

0, .50

0, .17

0, .17

Iteration 2: Input Iteration 2: Update Scores Iteration 2: Normalize Scores

.50, .33.83, 0

0,0.25, .14

.42, 00,0

.17, .17.50, 0

0,0

Iteration 3: Input Iteration 3: Update Scores Iteration 3: Normalize Scores

.57, 0

0,1

0, .42

0, .42

.31, 0

0, .46

0, .19

0, .19

.33, 0

0, .43

0, .21

0, .21

20

.43, .33.86, 0

0,0.23, .16

.46, 00,0

.25, .14.42, 0

0,0

Finding CommunitiesHITS

Can apply HITS to entity interaction graph to find communitiescommunities

Entities with large authority scores are the “core” or “authoritative” members of the communityy

ClusteringApply agglomerative or K means clustering to entity graphApply agglomerative or K-means clustering to entity graph

How to choose K?

Evaluating community finding algorithms is hard

Can use communities in various ways to improve y psearch, browsing, expert finding, recommendation, etc.

21

Community Based Question AnsweringSome complex information needs can’t be answered by

traditional search enginesInformation from multiple sources

Human expertise

Community based question answering tries to overcome these limitations

Searcher enters question

Community members answer question

22

Example Questions

23

Community Based Question AnsweringPros

Can find answers to complex/obscure questions

Answers are from humans, not algorithms

Can search archive of previous questions/answers

ConsOften takes time to get a responseOften takes time to get a response

Some questions never get answered

Answers may be wrongAnswers may be wrong

24

Question Answering ModelsHow can we effectively search an archive of question/

answer pairs?Can be treated as a translation problemCan be treated as a translation problem

Translate a question into a related question

Translate a question into an answerTranslate a question into an answer

Translation-based language model:

Enhanced translation model:

25

Computing Translation ProbabilitiesTranslation probabilities are learned from a parallel corpus

Most often used for learning inter language probabilitiesMost often used for learning inter-language probabilities

Can be used for intra-language probabilitiesTreat question / answer pairs are parallel corpus

Various tools exist for computing translation probabilitiesVarious tools exist for computing translation probabilities from a parallel corpus

26

Example Question/Answer Translations

27

Collaborative Search Scenarios

28

Co-located Collaborative Searching Remote Collaborative Searching

Collaborative SearchChallenges

How do users interact with system?y

How do users interact with each other?

How is data shared?

What data persists across sessions?

Very few commercial collaborative search systemsVery few commercial collaborative search systems

Likely to see more of this type of system in the future

29

Document FilteringAd hoc retrieval

Document collections and information needs changegwith time

Results returned when query is entered

Document filteringDocument collections change with time, but information g ,

needs are static (long-term)

Long term information needs represented as a profile

Documents entering system that match the profile are delivered to the user via a push mechanism

30

ProfilesR t l t i f ti dRepresents long term information needs

Can be represented in different waysBoolean or keyword query

Sets of relevant and non-relevant documents

Relational constraints

• “published before 1990

• “price in the $10-$25 range”

Actual representation usually depends on underlying filtering model

Can be static (static filtering) or updated over time ( g) p(adaptive filtering)

31

Document Filtering Scenarios

Profile1 Profile

1Profile

1.1

Profile2

Profile

Profile2

Profile

Profile 2.1

Profile3 Profile

3Profile

3.1

Document Streamt = 2 t = 3 t = 5 t = 8

Document Streamt = 2 t = 3 t = 5 t = 8

Static Filtering Adaptive Filtering

32

Static FilteringGi fi d fil h d t i ifGiven a fixed profile, how can we determine if an

incoming document should be delivered?

Treat as an IR problemBoolean

Vector space

Language modeling

Treat as supervised learning problemNaïve Bayes

Support vector machines

33

Adaptive FilteringIn adaptive filtering, profiles are dynamic

How can profiles change?How can profiles change?User can explicitly update the profile

User can provide (relevance) feedback about theUser can provide (relevance) feedback about the documents delivered to the profile

Implicit user behavior can be captured and used to update the profile

34

Fast Filtering with Millions of ProfilesReal filtering systems

May have thousands or even millions of profiles

Many new documents will enter the system daily

How to efficiently filter in such a system?How to efficiently filter in such a system?Most profiles are represented as text or a set of features

Build an inverted index for the profilesBuild an inverted index for the profiles

Distill incoming documents as “queries” and run against index

35

Evaluation of Filtering SystemsDefinition of “good” depends on the purpose of the

underlying filtering system

Generic filtering evaluation measure:

α = 2, β = 0, δ = -1, and γ = 0 is widely used

36

Collaborative FilteringIn static and adaptive filtering, users and their profiles

are assumed to be independent of each other

Similar users are likely to have similar preferences

Collaborative filtering exploits relationships betweenCollaborative filtering exploits relationships between users to improve how items (documents) are matched to users (profiles)

37

Recommender SystemsRecommender systems recommend items that a user

may be interested in

Examples

Amazon.com

NetFlix

R d t ll b ti filt i tRecommender systems use collaborative filtering to recommend items to users

38

Recommender System AlgorithmsInput

<user, item, rating> tuples for items that the user has li itl t dexplicitly rated

Typically represented as a user-item matrix

Output<user, item, rating> tuples for items that the user

has not rated

Can be thought of as filling in the missing entries of the user-item matrixuser item matrix

Most algorithms infer missing ratings based on the ratings of similar usersratings of similar users

39

Recommender Systems

1111

??11 11

44

22

33

11 55??

55

??

55?? 5555??

40

Collaborative SearchingT diti l h i l hTraditional search assumes single searcher

Collaborative search involves a group of users, with a common goal - searching together in a collaborative setting

Example scenariosStudents doing research for a history report

Family members searching for information on how to care for an aging relative

T b ki t th i f ti dTeam member working to gather information and requirements for an industrial project

41

Collaborative SearchTwo types of collaborative search settings depending

on where participants are physically located

Co-locatedParticipants in same location

Co-Search system

Remove collaborativeRemove collaborativeParticipants in different locations

Search Together systemSearch-Together system

42

Static Filtering with Language ModelsAssume profile consists of K relevant documents (Ti),

each with weight αi

Probability of a word given the profile is:

KL divergence between profile and document model isKL divergence between profile and document model is used as score:

If –KL(P||D) ≥ θ, then deliver D to P( || ) ,Threshold (θ) can be optimized for some metric

43

Adaptive Filtering ModelsRocchio

Profiles treated as vectors

Relevance-based language modelsProfiles treated as language models

44

Summary of Filtering Models

45

top related