organizing search results

28
Sackler – May 11, 2003 Organizing Search Organizing Search Results Results Susan Dumais Susan Dumais Microsoft Research Microsoft Research

Upload: reece

Post on 14-Feb-2016

37 views

Category:

Documents


0 download

DESCRIPTION

Organizing Search Results. Susan Dumais Microsoft Research. Organizing Search Results. Algorithms and interfaces that improve the effectiveness of search Beyond ranked lists Main goal to support search Also information analysis and discovery Example applications - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Organizing Search Results

Sackler – May 11, 2003

Organizing Search Organizing Search ResultsResults

Susan DumaisSusan DumaisMicrosoft ResearchMicrosoft Research

Page 2: Organizing Search Results

Sackler – May 11, 2003

Organizing Search ResultsOrganizing Search Results Algorithms and interfaces that Algorithms and interfaces that

improve the effectiveness of searchimprove the effectiveness of search Beyond ranked lists Beyond ranked lists Main goal to support searchMain goal to support search Also information analysis and discoveryAlso information analysis and discovery

Example applicationsExample applications SWISH, results classificationSWISH, results classification GridViz, results summarizationGridViz, results summarization SIS, personal landmarks for contextSIS, personal landmarks for context

Page 3: Organizing Search Results

Sackler – May 11, 2003

Searching with Information Searching with Information Structured Hierarchically Structured Hierarchically

(SWISH)(SWISH) CollaboratorsCollaborators

Edward Cutrell, Hao Chen (Berkeley)Edward Cutrell, Hao Chen (Berkeley) Key ThemesKey Themes

Going beyond long lists of resultsGoing beyond long lists of results Classification algorithmsClassification algorithms UI techniquesUI techniques

More about itMore about it http://http://research.microsoft.comresearch.microsoft.com /~ /~sdumaissdumais

Page 4: Organizing Search Results

Sackler – May 11, 2003

Query: “jaguar”

Organizing Search Organizing Search ResultsResults

List Organization

=> Shopping

=> Automotive

=> Automotive

=> Computers

SWISH Category Organization

Page 5: Organizing Search Results

Sackler – May 11, 2003

LookSmart Directory StructureLookSmart Directory Structure ~400k pages; 17k categories; 7 levels~400k pages; 17k categories; 7 levels 13 top-level categories; 150 second-level 13 top-level categories; 150 second-level

categoriescategories Top-level CategoriesTop-level Categories

Web DirectoryWeb Directory

AutomotiveBusiness & FinanceComputers & InternetEntertainment & MediaHealth & FitnessHobbies & InterestsHome & FamilyPeople & ChatReference & EducationShopping & ServicesSociety & PoliticsSports & RecreationTravel & Vacations

Buy or Sell a CarChatFinance & InsuranceMagazines & BooksMaintenance & RepairMakes, Models & ClubsMotorcyclesNew Car ShowroomsOff-Road, 4X4 & RVsOther Auto InterestsShows & MuseumsTrucks & TractorsVintage & Classic

Page 6: Organizing Search Results

Sackler – May 11, 2003

SWISH SystemSWISH System Combines the advantages ofCombines the advantages of

Directories - Manually crafted structure but Directories - Manually crafted structure but small <~3 million pages>small <~3 million pages>

Search engines - Broad coverage but limited Search engines - Broad coverage but limited metadata <~3 billion pages>metadata <~3 billion pages>

Project search engine results to category Project search engine results to category structurestructure

Two main componentsTwo main components Text classification models Text classification models UI for integrating search results and structure UI for integrating search results and structure

Context (category structure) plus focus (search results)Context (category structure) plus focus (search results)

Page 7: Organizing Search Results

Sackler – May 11, 2003

SWISH ArchitectureSWISH Architecture

manuallyclassified

webpages

SVMmodel

Train(offline)

websearchresults

localsearchresults

...Classify(online)

Page 8: Organizing Search Results

Sackler – May 11, 2003

Learning & ClassificationLearning & Classification Support Vector Machine (SVM)Support Vector Machine (SVM)

Accurate and efficient for text classification Accurate and efficient for text classification (Dumais et al., Joachims)(Dumais et al., Joachims)

Model = weighted vector of wordsModel = weighted vector of words ““Automobile” = motorcycle, vehicle, parts, automobile, harley, Automobile” = motorcycle, vehicle, parts, automobile, harley,

car, auto, honda, porsche …car, auto, honda, porsche … ““Computers & Internet” = rfc, software, provider, windows, Computers & Internet” = rfc, software, provider, windows,

user, users, pc, hosting, os, downloads ...user, users, pc, hosting, os, downloads ... Hierarchical models for LS directoryHierarchical models for LS directory

1 model for top level; N models for second1 model for top level; N models for second Very useful in conjunction w/ user interactionVery useful in conjunction w/ user interaction

Page 9: Organizing Search Results

Sackler – May 11, 2003

List Organization Category Organization

User Interface User Interface ExperimentsExperiments

Page 10: Organizing Search Results

Sackler – May 11, 2003

60

70

80

90

100

110

120

Hover Inline No Cat

Names

Browse

Hover Inline + Cat Names

Group Interface List Interface

Page 11: Organizing Search Results

Sackler – May 11, 2003

Effect of Query Difficulty

0

20

40

60

80

100

120

140

HARD

HARDE

ASY

EASYGroup List

Easy queries are faster (p<0.01)

Group faster than List (p<0.01)

Benefit is larger for hard queries (p<0.06)

Page 12: Organizing Search Results

Sackler – May 11, 2003

SWISH: Summary and SWISH: Summary and Design ImplicationsDesign Implications

Text ClassificationText Classification Learn accurate category Learn accurate category

modelsmodels Classify new web pages on-Classify new web pages on-

the-flythe-fly Organize search resultsOrganize search results

User InterfaceUser Interface Tightly couple search Tightly couple search

results with category results with category structurestructure

User manipulation of User manipulation of presentation of category presentation of category structurestructure

Page 13: Organizing Search Results

Sackler – May 11, 2003

GridVizGridViz CollaboratorsCollaborators

George Robertson, Edward Cutrell, George Robertson, Edward Cutrell, Jeremy Goecks (Georgia Tech)Jeremy Goecks (Georgia Tech)

Key ThemesKey Themes Abstract beyond individual resultsAbstract beyond individual results Highly interactive interface to support Highly interactive interface to support

understanding of trends and relationshipsunderstanding of trends and relationships More about it More about it

http://http://research.microsoft.com/~sdumaisresearch.microsoft.com/~sdumais

Page 14: Organizing Search Results

Sackler – May 11, 2003

GridVizGridViz Summarize the results of a searchSummarize the results of a search Grid-based designGrid-based design

Axes represent topic, time, peopleAxes represent topic, time, people Cells encode frequency, recencyCells encode frequency, recency

Supports activities like:Supports activities like: What newsgroups are active (on topic x)?What newsgroups are active (on topic x)? What people are active, authoritative (on topic What people are active, authoritative (on topic

x)? x)? When did I last interact w/ people?When did I last interact w/ people?

Page 15: Organizing Search Results

Sackler – May 11, 2003

GridViz DemoGridViz Demo

Page 16: Organizing Search Results

Sackler – May 11, 2003

User Interface User Interface ExperimentsExperiments

List View

GridViz

05

10152025303540

05

10152025303540

GridViz List-view

Page 17: Organizing Search Results

Sackler – May 11, 2003

GridViz SummaryGridViz Summary Abstracting beyond individual resultsAbstracting beyond individual results Highly interactive interfaceHighly interactive interface Grid-based designGrid-based design

Axes represent people, topic, timeAxes represent people, topic, time Cells encode frequency, recency Cells encode frequency, recency

Preliminary but promisingPreliminary but promising

Page 18: Organizing Search Results

Sackler – May 11, 2003

Stuff I’ve Seen (SIS)Stuff I’ve Seen (SIS) CollaboratorsCollaborators

Edward Cutrell, Raman Sarin, JJ Cadiz, Gavin Edward Cutrell, Raman Sarin, JJ Cadiz, Gavin Jancke, Daniel Robbins, Merrie Ringel Jancke, Daniel Robbins, Merrie Ringel (Stanford)(Stanford)

Key ThemesKey Themes Your contentYour content Information re-useInformation re-use Integration across sourcesIntegration across sources

More about it More about it … … internal for nowinternal for now

Page 19: Organizing Search Results

Sackler – May 11, 2003

Search Today …Search Today … Many locations, interfaces for

finding things (e.g., web, mail, local files, help, history, intranet)

Often slow

Page 20: Organizing Search Results

Sackler – May 11, 2003

Search with SISSearch with SIS Unified index of stuff you’ve seen

Unify access to information regardless of source – mail, archives, calendar, files, web pages, etc.

Full-text index of content plus metadata attributes (e.g., creation time, author, title, size)

Automatic and immediate update of index Rich UI possibilities, since it’s your content

Architecture Client side indexing and storage Built using MS Search components

Page 21: Organizing Search Results

Sackler – May 11, 2003

SIS DemoSIS Demo

Page 22: Organizing Search Results

Sackler – May 11, 2003

SIS Alpha ObservationsSIS Alpha Observations 800+ internal users

Usage logs (incl different interfaces), survey data

File types opened 76% Email 14% Web pages 10% Files

Age of items accessed 7% today 22% within the last week 46% within the last month

Item Access Distribution

0

20

40

60

80

100

120

0 500 1000 1500 2000 2500

Days Since Item First Seen

Freq

uenc

y

Page 23: Organizing Search Results

Sackler – May 11, 2003

SIS Alpha ObservationsSIS Alpha Observations Use of other search tools

Non-SIS search for web, email, and files decreases

Importance of people 25% of the queries involve

people’s names Importance of time

Date by far the most popular sort field, followed by rank, author, title

Even when rank is the default

Files Email Web Pages0

1

2

3

4

5

6

Pre-usage

Post-usage

Page 24: Organizing Search Results

Sackler – May 11, 2003

SIS UI InnovationsSIS UI InnovationsTimeline w/ LandmarksTimeline w/ Landmarks

Importance of timeImportance of time Timeline interfaceTimeline interface

Contextualize results using Contextualize results using important landmarks as important landmarks as pointers into human memorypointers into human memory General: holidays, world eventsGeneral: holidays, world events Personal: important photos, Personal: important photos,

appointmentsappointments

Page 25: Organizing Search Results

Sackler – May 11, 2003

Milestones in Time DemoMilestones in Time Demo

Page 26: Organizing Search Results

Sackler – May 11, 2003

Milestones in TimelineMilestones in Timeline

Landmarks + Dates Dates Only0

5

10

15

20

25

30

Sea

rch

Tim

e (s

)

Page 27: Organizing Search Results

Sackler – May 11, 2003

SIS SummarySIS Summary Unified index of stuff you’ve seen

Fast access to full-text and metadata, from heterogeneous sources

Automatic and immediate update of index Rich UI possibilities

Next steps Better support for tagging -> “flatland” Implicit queries for finding related info, and

identifying “Stuff I Should See” Integration with richer activity-based info,

Eve

Page 28: Organizing Search Results

Sackler – May 11, 2003

Organizinging Search Organizinging Search ResultsResults

Algorithms and interfaces to improve searchAlgorithms and interfaces to improve search Use structure and contextUse structure and context

Examples and key themesExamples and key themes SWISH … groupingSWISH … grouping GridViz … abstractionGridViz … abstraction SIS … personal content and landmarksSIS … personal content and landmarks

AlsoAlso Important attributes: People, topics, timeImportant attributes: People, topics, time InteractionInteraction EvaluationEvaluation

More informationMore information http://research.microsoft.com/~sdumaishttp://research.microsoft.com/~sdumais [email protected]@microsoft.com

Christopher Lee of (SIG)IR … Christopher Lee of (SIG)IR … http://http://www.cdvp.dcu.ie/SIGIR/index.htmlwww.cdvp.dcu.ie/SIGIR/index.html